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THE EFFECT OF AGE ON THE STANFORD-BINET 
VOCABULARY SCORE OF ADULTS! 


DAVID SHAKOW AND ROSALINE GOLDMAN 
Worcester State Hospital, Worcester, Mass. 


INTRODUCTION 


During the course of the standardization of a memory test for 
adults in the ages ranging from the late teens through the ninth decade, 
it was necessary to have some device for rather quickly determining 
approximate mental level in order to make certain of an equal distri- 
bution of mental ability at the various chronological age levels. 

A survey of the field indicated that such a device might be the 
vocabulary test. Its use for measuring adult intellectual level has 
been suggested* for various reasons, among which is its fairly high 
correlation with intelligence as measured by other devices. Ter- 
man (17, p. 230) found that the vocabulary test had ‘“‘a far higher 
value than any other single test of the scale’’ and that it gave an IQ 
within ten per cent of the whole scale. A close relationship between 
mental level and vocabulary was found by Weisenburg, Roe, and 
McBride (*, Table XXIX) who report a correlation coefficient of 
.81 in their adult group. It has also been claimed that vocabulary 
is independent of the number of years of schooling which an individual 
has had. Terman,'® in defending his stand for the vocabulary test, 
argued that the vocabulary score depended on mental level and was 
little influenced by chronological age. The correlation found between 
vocabulary score and mental age on the Stanford was .91 for children 





1From the Research Service of the Worcester State Hospital, Worcester, 
Massachusetts. 


* Babcock? and O’Connor™* among others have used it in this way. Intellectual 
level when determined by the vocabulary test must, of course, be considered as 
limited to the verbal aspects of intelligence. 
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and .81 for adults; the difference in magnitude of the correlations he 
laid to the ‘‘motley character” of the adult group.* 

Despite the fact that the vocabulary test had not been firmly 
established as an adequate test of adult mental level several consider- 
ations led us, when the occasion mentioned arose, to adopt the Stan- 
ford vocabulary test as an indicator of mental level in adults.t The 
vocabulary test appealed to us as being the most adequate for our 
purposes because it is easy to administer, it may be presented orally, 
visually or both (a consideration which is quite important in the later 
decades), and its “‘intelligence test’? nature may at least in part be 
disguised. Since it was also planned to use the results obtained from 
the vocabulary test in an investigation of ‘‘deterioration”’ in psychosis, 
the advisability of using this device seemed further established. 

The material collected from the study thus undertaken gave a 
body of data on the vocabulary function which afforded the possibility 
of answering a question heretofore not adequately answered in the 
literature; namely, what is the effect of age on vocabulary? It seemed 
also to afford the possibility for obtaining a good estimate of the 
‘average adult” vocabulary level and additional knowledge of the 
relationship between vocabulary and educational level. 





*It is interesting to note, however, that Weisenburg, Roe, and McBride 
(20, Table X XIX) report the same coefficient for an adult group which is considered 
representative of the average population. The correlations cited are quite high 
and lead one to suspect that they may in part be spurious, since it is likely that 
the mental age was based on a Stanford score which included the vocabulary items. 
The coefficient of .81 obtained by Weisenburg, Roe, and McBride is known to be 
based on the relationship between the total Stanford mental age and vocabulary 
score. The vocabulary test enters into all the upper levels of the Stanford, so 
that unless the scale is evaluated without the credits earned on these tests, it is 
to be expected that the correlation will be affected in an upward direction, espe- 
cially when the upper test levels are used. It appears necessary, therefore, to 
discount partially the relationship between vocabulary and mental age as it is 
presented in terms of correlation coefficients. This is especially true when the 
regression coefficients are not given, since the level of the mental age for given 
vocabulary scores is most important. 

From data in a study by Altman and Shakow! computations of the coeffi- 
cients in a group of fifty-six adult subjects for vocabulary against total Stanford- 
Binet MA and against recalculated Stanford-Binet MA without vocabulary, were 
made. It was found that the coefficient was .06 less in the latter, indicating that 
the error is probably not serious and that the coefficients as given may be accepted 
as approximately correct. 

+ The first list only was used by us, the score reported being double the score 
obtained on List 1. The administration was the standard one. 
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The group on whom complete data were obtained consisted of 
three hundred forty-eight subjects, two hundred thirty-two male and 
one hundred sixteen female. They ranged in age from eighteen to 
eighty-nine. The occupational distribution is given in Table IA. 

All the three hundred forty-eight records could not be used for a 
representative sample of the population since certain selective factors 
had operated to bias the sample. A representative group for the 
present purposes would be one in which the decade samples were of 
the same mental level. The criterion which appealed to us as best 
suited to obtain equality of intellectual level was educational attain- 
ment. Assuming that the intellectual level of the population had not 
changed through the years, then subjects selected from the three 
levels of educational attainment (grammar school, high school, and 
college) in a number proportional to the number in the population 
at their respective decades, would presumably constitute samples 
roughly equated for intellectual level. Since the educational level 
of the American population has increased during the course of the 
years it would be expected that succeeding decade groups of the same 
intellectual level would show increasing educational attainment.* 

In order to obtain the necessary information regarding the educa- 
tional attainment of the population, a search was made through the 
reports of the Commissioner of Education and census reports for the 
years under consideration. However, nowhere were there available 
data describing the actual school attainment of the decades with 
which we were concerned. Standards were finally set up by inferences 
and estimates from school attendance records and from odd items 
obtained from a variety of sources. The final criteria should, there- 
fore, be considered only as ‘‘first approximations,” although it is 
probable that they are the best approximations which can be made 
from the published data at present available. f 





* We were not interested primarily in determining the level of attainment of 
the population as a whole, so no attempt was made to keep the different age groups 
represented in proportion to the general population. However, as will be seen 
later, when a generalization is made for the “‘population as a whole” the various 
decades are given the proper weight in proportion to their respective numbers in 
the population. 

t It does not seem advisable to present the numerous steps gone through by us 
in establishing the standards. It may suffice to say that we used the data on 
school attendance and age distribution of the population in the Report of the 
Census of 1930 and whatever information was available in previous census reports; 
advance chapters from the Biennial Survey of Education in the United States 
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Our work was considered as being done in the year 1930—actually 
it was in progress from 1928-1933. We were interested in determining 
for each decade group the per cent incidence of grammar-school, 
high-school and college* education for the period when all subjects 
had had an opportunity to pass through the complete school-age range. 
The years from five to twenty-four were considered the school-age 
years. It was necessary, for example, in the forty—forty-nine-year 
group to know what the educational attainment of the population 
just having passed through the school age was during the years 
1886-1914, the period during which all of the group would have been 
through school. In order to simplify the computations it was deemed 
desirable to use the middle of the period as a reasonable average base, 
e.g., for the group under discussion, the year 1900. For the eighty- 
eighty-nine-year group 1860 was considered as the representative 
year and for the twenty-twenty-nine-year group, 1920. For the 
eighteen—nineteen-year group it was, of course, necessary to take into 
consideration the fact that the group had not as yet had the opportun- 
ity to live through the complete school age range. The distributions 
finally decided upon as being most nearly correct are given in Table I. 
They indicate that the educational level of the population has changed 
considerably in the period under discussion, a factor which would 
have to be seriously considered in the selection of any sample of adults 
by educational criteria for the study of psychological functions. 

When the distribution criteria set up were met, the three hundred 
forty-eight available subjects were reduced to two hundred three— 
one hundred thirty-two male and seventy-one female. 

The occupational distribution of the selected as well as of the 
original sample is given in Table IA. 





for 1932-1934, and also previous Biennial Surveys. Phillips’! data and Judd’s 
statement in ‘‘ Recent Social Trends,’’® gave additional information. Since there 
were many discrepancies in the data, in each case it was necessary to accept what 
seemed to be the most reasonable figures. The percentages for the decades for 
which the most satisfactory data were available, were plotted and smoothed curves 
were drawn through the points—the smoothing being done graphically. That 
our standards are approximately correct is indicated by the fact that the average 
educational level of our group twenty years and over (weighted for proportionate 
representation in the population) agrees quite closely with the estimate made by 
Foster‘ of the education of the population twenty-one years old and over, when his 
figures are corrected to eliminate illiteracy. The latter was necessary since our 
norms were established for the literate population. 

* “Grammar school,” “‘high school,” and ‘‘college” are used throughout to 
indicate some attendance at rather than graduation from these institutions. 
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TABLE I.—EsTIMATED PERCENTAGE DISTRIBUTIONS OF SCHOOLING IN THE 
GENERAL PopuLATION OF 1930 By DirrERENT AGE LEVELS, TOGETHER 
WITH THE AcTUAL NUMBER OF SuBEJcTs USED IN THE PRESENT 


SrupyY OF THE DIFFERENT AGE AND EDUCATIONAL LEVELS 











Grammar school High school College 

A Total 
ges 

Per cent |No. used| Per cent |No. used! Per cent |No. used No. used 
18-19 23 6 62 15 15 3 24 
20-29 46 31 41 28 13 9 68 
30-39 65 13 25 5 10 2 20 
40-49 79 20 14 4 7 2 26 
50-59 89 20 8 2 5 1 23 
60-69 94 22 4 1 2 1 24 
70-79 95 0 4 0 1 0 0 
80-89 96 17 4 1 0 0 18 
18-89 129 56 18 203 


























TaBLE I[A.—OccuPATIONAL STATUS OF THE SELECTED Group oF SUBJECTS 
(N = 203) AND OF THE ORIGINAL Group oF Sussects (N = 348) 











Selected sample | Original sample 

M. F, M. F. 

ee 26 32 44 49 
Factory workers, mechanics, etc............ 18 12 42 14 
a6 Ee nee Nak aes a bee ei ekh ae bed 15 0 16 0 
i ar ap i a On wet ele nin 5 0 5 0 
Clerks and salespeople....................... 7 6 14 14 
ns xine eee 4h and tase eane dns hg 0 7 0 | 9 
Managers and supervisors.................-. 1 2 7 | 2 
REGRESS eR AT SE ac 3 1 9 2 
Teachers, ministers, etc................ 10 3 15 8 
eee ens eae ew ken A464 0K 6a | § 4 11 14 
Occupation unknown (delinquents)t.......... zz 0 46 0 
ED CIN, os cakecadwenestddcdedes | 15 0 23 4 
ce in aha ne knack ea uke has | 136 67 232 116 




















* Aged subjects retired from active participation in affairs for some time 


previous to the examination. 


The occupations represented were predominantly 


of an “average” nature, e.g., machine-shop, factory, watchmaking, weaving, 
storekeeping, etc. 

+ A group of adult delinquents, mostly in the 20’s and 30’s, meeting the intel- 
lectual and educational standards of the average population. 
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RESULTS 


The results obtained from the selected group of two hundred three 
subjects are given in Table II. The constants for the eighth decade 
are not presented, since its subjects were not available in sufficient 
number to be distributed through the proper educational levels. 


TaBLeE II.—DIstTRIBUTION OF MEAN EDUCATION IN GRADES AND MEAN 
VOCABULARY Scores OF SELECTED ADULTS BY DECADES 























ro Education Vocabulary score 
level N 

Mean SD Mean SD 
18-19 24 10.4+0.5| 2.2+0.3 | 56.44+2.4) 11.34+1.7 
20-29 68 9.9+0.4| 2.8+0.3 | 56.94+1.5| 12.5 41.1 
30-39 20 9.3+0.9;) 3.8+0.6 | 57.0+3.8| 16.5 + 2.7 
40-49 26 8.5+0.7) 3.34+0.5 | 57.84+3.2) 15.5 +2.2 
50-59 23 7.1406) 2.6+0.4 | 58.7 +2.3 | 10.6 + 1.60 
60-69 24 7.140.5 |) 2.5+ 0.4 | 53.0 + 2.8 | 13.5 + 2.0 
70-79 
80-89 18 5.1+0.7) 2.9+0.5 | 50.6+4.5 | 18.6+3.2 
18-89 203 8.6+0.2| 3.38+0.2 | 56.1 +1.0* 14.0 +0.7 








* This is the mean vocabulary score of the selected group of two hundred three 
subjects. When each decade is weighted according to its per cent incidence in 
the total white population (Fifteenth Census, 1930, p. 581) the mean becomes 56.5. 
Since the difference is negligible, we shall consider our total group as an adequate 
sample of the total white population for the present purposes. 


It will be noticed from the table that the mean vocabulary score 
for the whole group is 56.1.* When considered by decades the score 
is seen to remain at practically the same level through the sixth decade 
and to drop from then on. It would seem from these data, based on 
consistency of trend rather than on critical ratios,t that vocabulary 
ability is not affected by age until the seventh decade, when it seems 
affected adversely, a process which continues with increasing age. 
The correlation ratio of age and vocabulary is —.10.f 





* Since the means and medians respectively for the sexes were practically 
identical no separate treatment was considered necessary. The means for 
male and female, respectively, were: 56.4, 55.4. 

{ The critical ratios are not significant for any pair of age levels. 

t After completion of this article the footnote on p. 303 of Terman and Merrill’s 
Measuring Intelligence (Boston: Houghton Mifflin, 1937) came to our attention. 
As nearly as we can tell from the short statement there given, the results obtained 
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The presentation of the data has involved the assumption that 
education, insofar as it reflects intellectual level, plays an important 
role in determining vocabulary scores. When the age groups are 
more or less equalized for intellectual level by obtaining representative 
educational samples for each decade group, vocabulary is found not 
to be affected by age until the seventh decade. In order to determine 
the relationship between educational level and vocabulary, a separate 
analysis by decades was made of the complete group of three hundred 
forty-eight subjects at the three educational levels. This material 
is presented in Table III. Whereas Table II gives the material for 
decade groups selected in such a way as to equate them roughly for 
intellectual level, Table III gives the material for age groups equated 
for educational level. 

It will be seen from Table III that the vocabulary score is definitely 
related to educational level. The means of the high-school group are 
higher than those of the grammar-school group, and those of the 
college group higher than those of the high-school group at each 
decade. The critical ratios indicate significant differences in the total 
groups and at each decade where the number with any legitimacy per- 
mits of comparison.* The correlation of education and vocabulary 
is .64. This degree of relationship is due, we believe, almost entirely 
to the indirect effect of mental level. 

More important for our purposes, however, is the difference in the 
trends of the constants for vocabulary in Tables I] and III. Whereas 
the former indicates a flat curve with a drop beginning about the 
seventh decade, curves drawn for the data in Table III would all have 
decided upward trends through at least the fifth decade with drops 
at either the sixth or one of the two succeeding decades. The tend- 
ency is perhaps most clearly indicated by the high-school group which 
is most uniform in educational level through the decades. It is seen 
that in this group the trend is consistently upward until the eighth 





on one hundred ten adults ranging in age from nineteen to eighty-four are not 
inconsistent with ours in relation to the effect of age on vocabulary. The correla- 
tion reported is .09. 

* Between grammar-school and high-school groups the critical ratios run in the 
neighborhood of ten for the total groups, and except at the seventy—seventy-nine- 
year level, where the ratio is 2.9, the lowest ratio is 3.7. Between the high-school 
and college group only one ratio could be determined, that for the twenty-twenty- 
nine-year level, where it is 8.0. As between age groups within an educational 
level, no significant critical ratios were found for successive age levels, although 
the extremes give significant differences. 
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decade and only then does it begin to drop. Presumably it is in this 
region that age begins to have an effect, or at least has a greater effect 
than that due to selectively higher mental levels. Our assumption 
that keeping number of years schooling constant results in an increas- 
ingly higher intellectual level in successive decades with a resultant 
pseudo-correlation of age and vocabulary score seems borne out, and 
justifies the procedure followed of equating intellectual level by taking 
representative educational samples for each decade. 

For those interested in the use of the vocabulary test as an adult 
mental test the following decile norms derived from our data are 
presented: 





Decile...| 1 2 3 4 5 6 7 8 9 10 
Score... .|39 44 48 51 54 57 62 67 75 | 75+ 





























DISCUSSION 


The ‘‘average’”’ Stanford vocabulary scores for adults as reported 
in the literature are in fairly close agreement with our results. Com- 
pared with our mean of 56.1 (SD 14.0), Weisenburg, Roe, and McBride 
(, p. 55) report 54.5 words (SD 15.6) to be the mean vocabulary 
score for a group of sixty-nine general hospital patients whose mean 
educational level was 8.1 compared to ours of 8.6. Their mean for 
List 1 alone was 55.5.* Fry® reports 51.7 words (SD 16.6) as the 
mean for a group of two hundred twenty-seven white prisoners whose 
mean chronological age was 31.7 years and whose mean MA was 
155.3 months. The one hundred forty delinquents studied by Shakow 
and Millard’® had a mean vocabulary score of 43.2 (SD 13.7). The 





* Louden”? indicates that List 1 is easier than List 2 at the lowest mental levels, 
approximately equal at the fourteen-year level, and more difficult at the higher 
levels. Because of the ‘‘average”’ nature of our group it seems reasonable to accept 
the scores obtained by us on the vocabulary, although they are based on List 1 
only. Two other studies on the average adult range in which both lists were given 
corroborate the equality of the two Stanford-Binet lists in the range within which 
we are working. The first is the study of Weisenburg, Roe, and McBride (*°, p. 56) 
in which a correlation coefficient of +.95 was found between the two lists, the 
means being for List 1, 27.76 and for List 2, 26.34. The second, an unpublished 
study from our laboratory on thirty-seven male adults, gives a mean for List 1 of 
28.46 and for List 2 of 28.54. 

+ Computed from the original data and not given in the mentioned report. 
This result is lower than that found by other investigators and is probably due to 
the lower intelligence level of this group (Mean MA 148.8 months). It may be 
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mean vocabulary score for four hundred eighty-two ‘miscellaneous 
adults” studied by Terman,'* which included business men (ages 
twenty-five to sixty-five, common school education, with a mean MA 
of sixteen years) and hoboes (mean MA fourteen years), is estimated 
to be 52.5 on the basis of the data he makes available. Babcock 
(?, p. 90) presents her data in such form that only an approximate 
figure can be determined for her normal population—the mean lies 
somewhere between fifty-one and fifty-seven words. Our results are 
thus seen to corroborate roughly the findings of other investigators 
whose groups are in general comparable. They are seen to be closest, 
however, to those of Weisenburg, Roe, and McBride,” the only study 
in which pains were taken to obtain a good “‘average”’ sample, and for 
that reason probably the one giving the most dependable results. 

In addition to the studies mentioned above in which the Stanford- 
Binet vocabulary was given to adults, there are reports available which 
discuss vocabulary performance but in which the data are not presented 
in a form which makes direct comparison possible. Among these are 
the studies of Grierson and Rixon,’ Terman, Knollin, et al.,! and 
Beeson.* Grierson and Rixon report the general intelligence level of 
a group of two hundred delinquents as being fourteen years (equivalent 
to between fifty and sixty-four words) when measured either by the 
vocabulary test or the whole scale. The group of business men and 
hoboes studied by Knollin, already mentioned earlier, were used by 
Terman in standardizing the upper levels of the Stanford test. All 
of the thirty business men (MA sixteen years) passed fifty words and 
fifty per cent passed seventy-five words; of the one hundred fifty hoboes 
(MA fourteen years) eighty per cent passed fifty words. Beeson 
examined twenty inmates of a home for aged people, who ranged in 
age from fifty-nine to ninety-three. From the personal sketches 
of the subjects which he presents, the ‘‘normality”’ of many of them 
may be questioned. He found that all subjects passed the vocabulary 
test at thirty words; eleven having vocabulary scores of sixty-five 
or over. Although less weight may be placed on these studies, they 
give some additional corroboration to the results of the investigations 
reported above that the average adult Stanford vocabulary level is 
in the region of fifty-five words. 

The question with which we were mainly concerned, however, was 
not the average vocabulary level of adults, but rather the effect of age 





pointed out here that a certain number of average level subjects of this group were 
used in the present study. 
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on vocabulary. Our results for a representative sample of the popula- 
tion (Table II), it will be remembered, indicated no effect of age from 
eighteen to sixty, with a subsequent decline to ninety. Of the studies 
in the literature involving the age factor in vocabulary in adults only 
three are concerned with the test used by us, 7.e., the Stanford-Binet 
vocabulary. 

Weisenburg, Roe, and McBride (*, p. 73) for sixty-nine selected 
cases find practically no correlation (+.16) with age for the adult 
range twenty to fifty-nine. There is a rise at thirty to thirty-nine, 
and no change thereafter. It will be seen that within the limits of the 
range covered by them there is close correspondence to our results— 
in fact, for the three upper decade levels the means are surprisingly 
similar to ours. 

The other two studies are not concerned with the adult range itself, 
but rather compare adults with children. Although Terman" argues 
against the influence of chronological age on vocabulary, his median 
vocabulary scores indicate from the consistency of the findings a 
difference in favor of the adults. The difference is found throughout 
the mental age range, but is relatively marked at fifteen and above.* 
MecFadden!! compares normal children with feeble-minded children of 
the same mental ages, and older and younger feeble-minded of the same 
mental ages. In the case of the former pair he finds vocabulary to be 
higher in the feeble-minded, who were, of course, chronologically older, 
and in the case of the latter pair the scores were in favor of the older 
feeble-minded. He concludes that although ‘‘vocabulary is much 
more highly correlated with mental age than with chronological 
age ... if mental age be held constant, those subjects with higher 
chronological age . . . will score higher than those subjects with lower 
chronological age... ” 

If one accepts the findings of these two studies, it seems that age 
has the effect of increasing vocabulary score, which would appear to 
contradict the findings presented in Table II. The apparent contra- 





*It might be indicated in passing that considerable question may be raised 
as to the “‘average”’ nature of the group of four hundred eighty-two adults used 
for standardizing the Stanford-Binet. Weisenburg, Roe, and McBride (?°, p. 55), 
whose group have an average MA of 13.6, gives an average vocabulary rating of 
54.5. Our selected group of presumably average intelligence gives a vocabulary 
score of 56.1. Terman’s group gives a median vocabulary of 49.6 for the thirteen- 
year MA level and 52.3 for the fourteen-year level. This discrepancy may account 
for the considerably higher MA equivalents for adults on the Stanford-Binet 
vocabulary when compared with other tests of the scale. 
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diction may perhaps be accounted for on the basis of the range of ages 
covered. In both studies children were compared with adults, whereas 
our study was limited to the adult range. There remains some incon- 
sistency, however, which may, we believe, be understood from a 
consideration of the difference between the vocabulary test, on the 
one hand, and the bulk of the other tests in the Stanford-Binet, on 
the other. It has been the experience of workers with the Stanford- 
Binet that although a considerable number of the items are affected 
favorably by recent schooling, experience in examination situations, 
etc., the vocabulary is probably least affected in this way. A reason- 
able explanation may be that the vocabulary differences found between 
children and adults (and between the younger and older feeble-minded) 
are indirectly determined. They are due not to actual increase of 
vocabulary with increasing chronological age, but rather to a decrease 
in achievement level on the other tests of the scale because of the 
greater effect on the latter of factors such as those just mentioned. 
The net result of this would be a relatively greater vocabulary score. 
Actually, however, one would be comparing adults who were probably 
of a mental age level somewhat higher, 7.e., originally more nearly 
equivalent to their vocabulary rating, with children who have a 
vocabulary rating equivalent to their mental age level. The evidence 
from various sources, some of which will be presented below, and 
from the vocabulary-mental age discrepancy found in psychotic 
subjects lends weight to this argument. 

The other studies to be discussed, although not concerned with 
the Stanford-Binet vocabulary test, throw additional light on the age- 
vocabulary relationship. The results are only indirectly comparable 
because multiple-choice tests are involved. In two studies the 
vocabulary section of the Minnesota College aptitude test was used. 
Sorenson!* reports results from a group of six hundred forty-one 
university-extension students up to age sixty-nine who were given this 
test, in which increasing vocabulary score is found for successively 
older age groups. These results are quite consistent with our data 
for college and high-school groups presented in Table III. Our 
interpretation of this, as has already been indicated, is that there is an 
increasing selectivity of subjects of higher intellectual level with 
increasing age, the result being an increase in vocabulary score which 
is apparently due to age. 

Christian and Paterson,‘ using the same test on one hundred 
twenty-nine parents and relatives of college freshmen ranging in age 
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from forty to sixty-nine, also find a definite increase of vocabulary up to 
sixty years of age, and even up to seventy. They point out that they 
are dealing with a superior population and are inclined to explain 
their results on this basis, indicating at the same time that the results 
may not hold for average or below average groups. These results, like 
Sorenson’s, are consistent with the data given in Table III for the high- 
school and college groups. It is seen, however, that our grammar- 
school group shows a similar, though less steep, increase up to sixty, 
so that the conjectures of these authors are not borne out. It thus 
seems more reasonable to explain the increasing vocabulary by the 
inevitable selectivity of mental level which comes with keeping educa- 
tional level constant through the decades, rather than to actual increase 
of vocabulary with increase in age. 

O’Connor!? used the G. E. Work-sample 95. He reports an 
increasing vocabulary from twelve to twenty-two at a rather steady 
rate, and a continued increase to fifty at a slower rate. He also 
reports on the vocabulary scores of ‘‘equally successful’? business 
executives in the upper decades!* among whom he finds no effect of 
age on vocabulary, which is consistent with our findings on the selected 
group (Table II). Here our findings are corroborated by another set 
of selective criteria. Whereas in our study an attempt was made to 
equalize mental ability through the decades by equating educational 
standards, O’Connor presumably equated mental ability by equating 
ability in the sense of ‘‘equal successfulness.”’ Although the standards 
are obviously rough in both cases, their corroboration of each other 
lends some additional weight to the legitimacy of the selective 
principles. 

Some data are found in the Jones and Conrad? study of a New Eng- 
land community in which they gave the Army Alpha Test to one 
thousand one hundred ninety-one subjects ranging in age from ten to 
sixty. Subtest 4 (the synonym-antonym test) is the nearest to a 
vocabulary test contained in the Alpha. This test did not show the 
post-adolescent decline with age found in all the other tests except 
Subtest 8 (general information). Insofar as these results should not 
be discounted because of the multiple-choice procedure and the 
limited sense in which Subtest 4 is a vocabulary test, they may be 
considered corroborative of our findings for the selected group. 

The last studies which need mention in this connection are the 
Stanford Maturity studies. On the basis of their results on a short 
form of the Otis Self-Administering test with eight hundred twenty- 
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three subjects ranging in age from seven to ninety-four, Miles (1s, 
p. 16) indicates that verbal associations and interpretations of mean- 
ings show marked resistance to the influence of age. Although the 
Miles’ data do not lend themselves to direct comparison, their material 
to this extent is also corroborative of our findings. 

It would thus seem that when the samples are more rigorously 
selected and more closely examined, vocabulary is a function which 
in adult years shows neither an increase nor a decrease with age at 
least up to sixty, after which age there is some evidence of a slow 
decline. Because of various considerations mentioned earlier, but 
mainly because of its relatively unchanging character, the vocabulary 
test appears to be an adequate device for determining adult intel- 
ligence. On the one hand it is little affected either by age or, in the 
usual course of events, by practice over and above that which is 
correlated with the ordinary experience of persons of varying intel- 
lectual capacity; on the other hand, it is little affected by the increasing 
separation from formal training, both as to content and setting which 
comes with age—a factor playing a considerable réle in most testing 
devices. 


SUMMARY . 


(1) The Stanford-Binet vocabulary test was given to two hundred 
three subjects of the age range eighteen to ninety selected in each 
decade to give a representative sample of its educational attainment. 

(2) When thus equated indirectly for mental level, vocabulary 
score was found to remain constant at a level of about fifty-seven 
words through the seventh decade with a slow decline thereafter. 

Nots.—In the review of the literature for the present paper, the 
studies of J. G. Gilbert‘?!22) were inexcusably overlooked. They 
came to our attention only when proof was being read. It is the 
purpose of the present note to attempt to account for an apparent 
discrepancy between her findings and ours. 

The average vocabulary levels of both groups used by Gilbert 
are considerably higher than for our corresponding age groups. When 
her means are refigured with a limiting upper mental age level of 
twenty, it is found that her twenty-year group has a mean vocabulary 
rating of 16.4 and her sixty-year group a rating of 16.0. These, by 
Babcock norms, are equivalent to sixty-four and sixty-two words 
respectively. When compared to the means taken from various 
sources, including ours, (see discussion earlier in this paper) the 
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conclusion cannot be escaped that Gilbert was working with better- 
than-average groups. This is corroborated by the high Barr ratings 
of both of her groups: 11.1 (twenty-year group) and 10.4 (sixty-year 
group). These may be compared with the Weisenburg, Roe and 
McBride (*°, p. 43) mean rating of 8.3. 

Her assumption that vocabulary level in the sixties has not dete- 
riorated and may therefore be used as a basis for equating the sixty- 
year group with a twenty-year group, is not consistent with our 
findings. Her use of occupation for roughly matching the groups is 
not justified on the same grounds that she herself recognizes as invalid 
with regard to educational level (?!, p. 13). The fact that her Barr 
ratings and vocabulary ratings are so nearly alike in the two groups 
leads one to suspect that her sixty-year group was originally of higher 
level and that the similar vocabulary levels at the present time are 
to be accounted for by decline in this group due to age. 
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ACHIEVEMENT OF STUDENTS IN SUBFRESHMAN 
COMPOSITION! 


CURTIS E. AVERY AND E. G. WILLIAMSON 


University of Minnesota 


Freshmen who enter the University of Minnesota are divided into 
four groups according to their proficiency in English and their aptitude 
for college work in general, as determined by a series of placement tests 
supplemented by the high-school record of each student. The first 
of these four groups is composed of those students (about twelve per 
cent of the entering freshmen) who are exempted from the college 
requirement in English. The second and third groups are made up of 
students assigned to English A-B-C, and students assigned to Composi- 
tion 4-5-6,? approximately seventy-four per cent of the freshmen being 
divided about equally between the two courses. The fourth group is 
composed normally of about fourteen per cent of the entering class, 
and is made up of students who are adjudged deficient in their prepara- 
tion for college English work. These students are assigned to a special 
course called Subfreshman Composition. 

A detailed explanation of the system used to make these four divi- 
sions has been published by the University of Minnesota.* What 
concerns us immediately is a description of the fourth group, the 
subfreshmen. The University publication mentioned above describes 
this group as follows: 


IV. Subfreshman.—Those required to take a course in Subfreshman 
Composition until such time as they are able to pass qualifying tests for 
Composition 4-5-6. This course carries no credit in the college. It is given 
under the direction of the Extension Division, and students are required to 
pay a special fee as in any other extension course.‘ 


The “ qualifying tests for Composition 4-5-6”’ mentioned above are 
not standardized. Actually, the criteria for qualification are tacitly 
assumed to be implicit in the subfreshman course itself. The general 





1 Grateful acknowledgement is made for the statistical assistance furnished by 
the Minnesota W.P.A. Project No. 6094-4214. 

? English A-B-C is a five-credit course involving a study of both literature and 
composition. Composition 4-5-6 is, as its name implies, basically devoted to drill 
in writing and the principles of rhetoric. . 

3 “The Placement System Used by the Department of English.” The Bulletin 
of the University of Minnesota, Vol. XX XIX, No. 53, October 21, 1936. 

‘4 Ibid., page 9. 
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practice has been to give a final examination on the content of the 
subfreshman course, and to require as a part of this examination an 
impromptu theme and a prepared theme. On the basis of this 
examination and these themes, from seventy per cent to eighty per cent 
of the subfreshman students in the Fall-quarter classes have been, 
within recent years at least, allowed to register for Composition 4.! 

It will be germane to the subject, perhaps, to explain here some of 
the characteristics and mechanics of the subfreshman course. A 
University ruling makes it impossible to utilize full-time members of 
the regular English staff as instructors. For this reason the subfresh- 
man teaching staff is normally made up of graduate students who have 
some experience in teaching, together with part-time or occasional 
instructors with more experience. As far as possible subfreshman 
instructors who have firsthand knowledge of the work in Composition 
4 are preferred to those who have no such knowledge. It must in 
fairness, however, be admitted that this is not always possible. The 
personnel of the subfreshman instructorial staff is not normally con- 
stant from one year to the next, nor from one quarter to the next. 

Subfreshman Composition classes use as a text one of the standard 
handbooks of grammar and composition. Classes meet three hours a 
week for one quarter; the students are drilled in elementary grammar, 
and are required to do exercises in sentence diagramming, sentence 
construction, punctuation, and usage. Each student writes from 
eighteen to twenty-one themes, including the two which constitute 
part of the final examination. Many of these themes are criticized in 
class, and all of them are revised by the student in accordance with the 
instructor’s directions. 

The program of study is mapped out by the instructorial staff, 
meeting as a committee under the direction of the instructor in charge 
of extension English classes. The examinations are prepared by this 
committee. The final examinations and the final themes are so 
arranged that no instructor grades the work done by his own students. 
The final grades, and all debatable grades on examinations and themes, 
are reviewed by the instructor in charge. 

As has been pointed out, qualification for Composition 4 is assumed 
when the student passes Subfreshman Composition. In other words, 
the student is passed from Subfreshman Composition to Composition 4 





1 Winter- and Spring-quarter classes of Subfreshman Composition contain a 
larger number of students who are repeating the course after failure, and hence 
graduate a larger percentage to Composition 4. 
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on different evidence from that which led to his assignment to Sub- 
freshman Composition originally, although the content of the subfresh- 
man examination is, in general, similar to that of the placement tests. 
The daily work of the student in the subfreshman course, and estimates 
of the improvement which he has shown are taken as only incidental 
factors in judging his fitness for Composition 4. These factors are 
estimated only subjectively. Quite obviously, certain important 
questions grow from the situation as outlined above. 

First—Does the subfreshman course actually succeed in preparing 

students to pass the tests which originally kept them from being 
admitted to Composition 4? Second—If the course fails in this, does 
it, nevertheless, prepare the student successfully to do the work 
in Composition 4? Third—What does Subfreshman Composition 
accomplish? 
_ To attempt at least a partial answer to these questions, as well as 
to other less fundamental but equally interesting ones, the General 
Extension Division and the University Testing Bureau collaborated 
in an experiment during the Fall quarter of the academic year 1935— 
1936, and the Fall quarter of the year 1936-1937. As will be seen 
later, some suggestive information not in direct answer to the questions 
outlined above was a by-product of the experiment. 

The mechanics of this experiment were simple. Students in the 
1935 Fall quarter Subfreshman Composition classes were given the 
entire sequence of placement tests? at the end of the course, and the 
results were compared with the results of the tests which the students 
took before coming to the University. The following year, in the 1936 
Fall quarter, the same procedure was followed. The classes in this 
quarter, however, used a different text from that used the preceding 
year, spent a great deal more time in “‘old fashioned”’ sentence analysis 
and diagramming, and spent time on spelling drills and rules for 
spelling which the 1935 classes did not spend. 

The experiment in neither quarter included any students who had 
taken the first tests earlier than three to five months before coming to 





1A mid-quarter examination is given, and to a certain extent the difference 
between the grades on this and the grades on the final examination may be said 
to be objective. This evidence, and the theme grades and classroom record, are 
used chiefly to decide questionable borderline cases which result from the final 
examination. 

2 (a) Codéperative English Test, (6) Minnesota College Aptitude Test, and (c) 
Impromptu theme. Beginning with the year 1937, The American Council on 
Education Psychological Examination was substituted for the Minnesota test. 
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the University. (Only Fall-quarter students were retested, in order 
that the results should indicate, as far as possible, gains made in Sub- 
freshman Composition, and not through participation in other Uni- 
versity courses.) In only one respect did the posttests differ from the 
pretests: the themes were graded by the instructors in Subfreshman 
Composition rather than by instructors in the Department of English. 

The establishment of a control group was extremely difficult, since 
any entirely satisfactory control would have meant refusing a large 
number of students the right to take Subfreshman Composition at the 
normal time. However, nineteen students were found who had not 
taken the course at the normal time, and who were willing to be retested 
after having spent only one quarter in the University without participa- 
tion in any English classes. In addition to these steps, the investiga- 
tors collected the records made in Composition 4 by the students who 
had been graduated from Subfreshman Composition. 


DO SUBFRESHMEN MEET THE COMPOSITION 4 STANDARD? 


It is obvious that this experiment will not give final answers to the 
four questions asked above. The results will be merely suggestive. 
Look, then, at the first question: Does the subfreshman course actually 
succeed in preparing students to pass the tests which originally kept 
them from being admitted to Composition 4? 

At the end of the first quarter of 1935, one hundred fifty-four sub- 
freshmen were retested. Of these, ninety-two were graduated to 
Composition 4, although according to the strict interpretation of the 
Placement Chart used by the Department of English only fifty-two 
students would have actually been eligible for regular English courses 
had they entered directly from high school. At the end of the first 
quarter of 1936, one hundred thirty-four subfreshmen were retested. 
One hundred and sixteen of these were passed to Composition 4, 
although only fifty-two (the number is a coincidence) were eligible 
according to the strict interpretation of the Placement Chart. 

In the 1935 groups, nineteen of the fifty-two students who met the 
strict requirements of the Placement Chart, did so by virtue of raising 
their C.A.T. scores slightly above fifty-one,' while their scores in other 
placement tests failed to meet the published requirements. In the 


= 


1 Many students might have been exempt from Subfreshman Composition if 
their C.A.T. had been only a few points higher. No student is assigned to Sub- 
freshman Composition, regardless of his other test and theme scores, if he has a 
C.A.T. of 51 percentile rank. 
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1936 group thirteen met the requirements. The increase in the mean 
C.A.T. score for both groups, considered as groups, is statistically 
significant, but the subject-matter of the C.A.T. is only incidental to 
the subfreshman course. Since the students mentioned above, who 
raised their C.A.T. scores slightly above fifty-one, did not meet the 
published requirements in usage and spelling, they may well be dis- 
counted in answering the question, ‘‘Do subfreshmen meet the Com- 
position 4 standard?”’ Thus, where they were retested on subject- 
matter with which Subfreshman Composition directly concerns itself, 
only thirty-three students in 1935, and only thirty-nine students in 
1936, actually raised their scores to the original fixed standard for 
admission to Composition 4. 

These figures may be supplemented by Table I. In this table the 
mean tests score of five hundred ninety-four students assigned to 
Composition 4 in the Fall quarter of 1935 may be supposed to indicate 
approximately the standard for regular Composition 4 students. 


TABLE I.—CoMPARISON OF SCORES ON ENGLISH PLACEMENT TESTS OF STUDENTS 
ASSIGNED TO COMPOSITION 4 WITH SCORES OF SUBFRESHMAN STUDENTS AFTER 




















INSTRUCTION 
Composition 4 Subfreshmen Composition 
(retest scores) 
Subtest 1935 1936 
No. Mean} SD 

No. |Mean!} SD | No. |Mean|} SD 
Usage..................}] 594 |64.28/17.26) 154 |55.19)14.90) 134 |54.22)15.47 
eben ae inno 594 |27.80| 8.69} 154 |20.67| 8.93) 134 |20.78) 8.02 
Vocabulary'............ 594 |48.05|12.81| 154 |36.85/12.30) 134 |88.73)10.74 
Theme.................| 594 | 5.62} .90) 150 | 6.28) .99) 130 | 5.89) .94 


























1In these years the vocabulary section of the Codperative English Test was 
substituted for the Minnesota College Aptitude Test. 


When the mean scores of students assigned to Composition 4 are 
compared separately with those of students in each of the 1935 and 
1936 subfreshman classes, we find that in every case the former means 
are significantly higher than the latter (not shown in Table I). The 
critical ratios range from 2.98 (the 1936 theme) to 9.01. It is clear, 
therefore, that students in subfreshman courses, as a group, have not 
increased their scores, on the average, up to the mean scores of students 
assigned to Composition 4. Whatever other merits Subfreshman 
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Composition may have, it is clear that it did not operate to overcome 
entirely those deficiencies which led to the original classification in this 
course. 


DOES SUBFRESHMAN COMPOSITION PREPARE STUDENTS FOR WORK 
IN COMPOSITION 4? 


In the Fall quarter of 1935, ninety-two students were graduated 
from Subfreshman Composition to Composition 4. In the Fall quarter 
of 1936, one hundred sixteen students passed Subfreshman Composi- 
tion. The records made in Composition 4 by one hundred fifty-three 
of these two hundred eight students are traceable. Table II shows the 
relationship between their subfreshman grades and their Composition 4 
grades. 


TABLE IJ.—CoMPARISON OF GRADES RECEIVED IN COMPOSITION 4 WITH GRADES 
IN SUBFRESHMAN ENGLISH OF STUDENTS WuHo Took Botnu CourRsEs 
Grades in Composition 4 








Grades in 
Subfreshman F E D C B A 
English 

A 

B as 3 12 

C 1 4 8 43 4 

D 5 3 24 34 4 

E 1 3 4 























It should be remembered that the students represented in this table 
were passed from Subfreshman Composition to Composition 4 on the 
basis of final examinations and final themes which made no pretense to 
being equivalent to the Placement Tests which originally led to their 
being assigned to Subfreshman Composition. 

This should also be pointed out: Winter- and Spring-quarter classes 
in Composition 4 generally are composed of subfreshman graduates. 
In fact, an effort is made to offer Composition 4 classes at the same 
hours at which subfreshman classes were offered during the preceding 
quarter, in order that students may go froin Subfreshman Composition 
in one quarter to Composition 4 the next quarter without disrupting 
their programs. 

It is true that the subfreshman graduate must meet the competition 
of all other Composition 4 students when he takes his final examination 
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in Composition 4, but this final-examination grade determines only a 
fraction of his grade in the course. Hence, in view of the evidence 
presented in Table III, it is perhaps not unfair to suggest that the 
standards of Composition 4 are changed, unconsciously, for many of 
the graduates of Subfreshman Composition. Certainly there appears 
to be very little relationship between the two sets of grades. Particu- 
lar attention is called to the fact that students with grades of D in 
Subfreshman English receive grades ranging from F to B in Composi- 
tion 4. Moreover, students with a grade of B in Subfreshman Com- 
position received C or D in Composition 4. It would appear that 
grading standards are not comparable in these two courses. 


WHAT DOES SUBFRESHMAN COMPOSITION ACCOMPLISH? 


Table III tells its own tale. The difference between the control 
group and the experimental group must be viewed with caution, since 


TasLtEJII.—SumMARY OF THE MEAN Scores ON TEST AND RETEST OF STUDENTS 
IN SUBFRESHMAN ENGLISH 





First test Retest 


Critical 
Mean SD Mean SD rato 




















A. Control Group of Nineteen Students Assigned to but Not Enrolled 
in Subfreshman English in 1935 





alt a a dace Sil el tel 39.47 | 18.77 | 44.95 | 12.02 1.31 
SRE eae > Fesreyrr 14.00 | 6.86 | 17.21 7.37 1.39 
EE nck cn bata hee hawk eed 28.16 | 8.86 | 36.95 | 11.98 | 2.57 
Cn gincaneoes ciceeewhesaateu 6.94 | 0.76 | 6.42 0.82 2.03 




















B. One Hundred Fifty-four Students Enrolled in Subfreshman English in 1935 





Sree me 41.79 | 12.08 | 55.19 | 14.90 | 8.67 
PS ccenrpetnctehedeabekeased 15.99 | 7.49 | 20.67 | 8.93 | 4.98 
EI ee eT Pere Tee 32.73 | 9.30 | 36.85 | 12.30 |} 3.32 
EN Ket aseeeoekns tasenbwestas 7.15 | 0.95 | 6.28 | 0.99 | 7.78 




















C. One Hundred Thirty-four Students Enrolled in Subfreshman English in 1936 








SC cent kaka Cokweswek used 41.87 | 11.20 | 54.22 | 15.47 | 7.49 
in at eneeb bc daddcn Mewtens 15.81 | 6.61 | 20.78 | 8.02} 5.54 
EE ii dion bes «bets deed buke 33.69 | 8.99 | 38.73 | 10.74 | 4.17 
Te ere ag ie Sa wee ie eel 6.30 | 0.87); 5.89 | 0.94] 3.65 
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the control group is unavoidably small in comparison with the experi- 
mental group. However, this comparison is suggestive in that the 
gains for students in subfreshman classes are significantly larger than 
those for the small group of control students. 

The gains reported in Table III for subfreshmen students are 
statistically significant for each subtest and for both years, the critical 
ratios ranging from 3.32 to 8.67. We have here definite evidence that 
instruction in Subfreshman Composition does result in very significant 
improvement as measured by standardized tests of usage, spelling, and 
vocabulary; there are also significant gains in the theme grades (the 
highest or ‘‘best’”’ theme grade is 1). It cannot be charged, therefore, 
that instruction in Subfreshman Composition has no discernible effect 
upon the language facility of students. Obviously mere practice effect 
does not explain the significant gains. Unfortunately, no comparable 
data for Composition 4 are available to permit comparison of gains 
resulting from the two composition courses. Nevertheless, we may 
conclude that instruction in the subfreshman classes has resulted in 
very significant improvement, even though most students have not 
attained the original placement standards for assignment to Composi- 
tion 4. 

The University bulletin on the Placement System used by the 
English Department before cited states: “The rating of the theme is 
the first (and for English composition the most important) factor in the 
placement table.”” The gains made by subfreshman students in their 
impromptu themes are, hence, of especial interest, since substantial 
improvement is found for group averages. It might be contended 
that the comparison of the theme grades used in the original assignment 
of students with the final theme in Subfreshman English classes was 
not valid, since two different instructors graded the two themes. That 
is, different standards of grading might have been used by instructors 
in subfreshman classes. To check this point, the records of students 
graduating from Subfreshman English in 1936 and enrolling in Com- 
position 4 were analyzed. Complete data were available for seventy- 
six students (not shown in Table III). The mean grade for these 
students on their final impromptu theme in Subfreshman English was 
5.66 (SD 1.01); the mean grade of the first impromptu theme of these 
same students in Composition 4 was 5.55 (SD 1.01). Apparently, as 
concerns group means, the marking standards of themes are compar- 
able in the two classes. In other words, instructors in Composition 4 
agree, as evidenced by their marking, that these students have not 
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met the standard expected of students admitted directly to Composi- 
tion 4. 

The scores on the vocabulary are particularly significant in view of 
the fact that these scores represent the college aptitude of the students 
tested. The mean scores and gains are then worth attention. It 
should be remembered that no student whose C.A.T. (Mean score on 
the vocabulary test) was above 51 was assigned to Subfreshman 
Composition. It is obvious that although the college aptitude of 
subfreshman graduates is significantly increased, yet these students 
are still, generally speaking, inferior students (see Table I). 

The different conditions involved in the 1935 and 1936 tests should 
be borne in mind. A study of grammar is, of course, always funda- 
mental in Subfreshman Composition, but in 1936 even more attention 
than usual was given to diagramming and the analysis of sentences. 
Moreover, a definite effort was made (not made in 1935) to drill the 
students in spelling, and to give them as many artificial aids to the 
mastery of spelling as possible. Despite these different conditions, 
the results for the two years do not differ much in statistically sig- 
nificant gains in test scores; indeed the gain in theme grades is less 
significant for the 1936 group. 


SUMMARY AND CONCLUSIONS 


The evidence of this experiment shows that the course in Subfresh- 
man English did not successfully prepare most of its students to pass 
the placement tests for Composition 4. Nevertheless those students 
who passed the subfreshman course automatically became eligible to 
enroll in Composition 4. Thus we see that the subfreshman course, 
to which are assigned students with low verbal facility as well as specific 
deficiencies in usage and spelling, was not able to overcome these handi- 
caps within three months time. 

The evidence does show, on the other hand, that a significant 
number of students who passed the subfreshman course were able to 
do satisfactory work itt Composition 4 despite the fact that they had 
not met the requirements of the placement system. These students 
did, moreover, make significant gains in a standard test of usage, even 
though they did not, as a group, improve up to the standard required 
of other freshmen for assignment to Composition 4. 

Do these facts indicate that the placement system is invalid? 
That instruction in Subfreshman English is unsatisfactory? That the 
grading standards in Composition 4 are at fault? Or do they suggest 
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that Subfreshman Composition is a mere period of penance for sins of 
omission or for basic mental incapacity? Has Subfreshman Composi- 
tion come to be a procedure wherein residence for three months makes 
eligible for regular English those students who continue to show 
inability to meet the published requirements? Or does the evidence 
indicate that instruction cannot overcome major deficiencies in lan- 
guage facility? Or—and this is most challenging—is it reasonable to 
expect that three months of instruction will enable students to over- 
come language deficiencies of many years’ standing? Is it reasonable 
to expect that so short a period of instruction will result in eligibility 
for Composition 4, by the original standards of classification, of seventy 
per cent of those students who have exhibited and continue to exhibit 
serious deficiency in English usage? 

Perhaps the basic conclusion, as well as a partial answer to the 
foregoing questions, is that, when all is said and done, there is no 
escaping the fact that Subfreshman Composition deals with definitely 
inferior students. There are, of course, exceptions to thisrule. Many 
subfreshmen have actually distinguished themselves in advanced 
English courses as well as in other college courses, without the aid of 
altered standards of marking. But the ideal subfreshman is a rara 
avis. He must have a fairly high college-aptitude test score, and his 
specific deficiencies in English must result from lack of proper training 
and not from basic inability to learn the English language. Such a 
student 1s almost never assigned to Subfreshman Composition, by virtue 
of the rule cited above. 

In view of this fact, the unwritten but virtually arbitrary rule that 
at least seventy per cent of the subfreshman each quarter must be 
graduated to Composition 4 is, to say the very least, open to severe 
criticism. Either a high mortality in Composition 4, or lowered grad- 
ing standards in Composition 4, would appear to be the inevitable 
result. 

Certainly no dogmatic recommendations should be made on the 
evidence of this report, but the authors do venture the following sug- 
gestions for consideration: 

1. Since it seems true that significant gains in Subfreshman Com- 
position still do not enable even a majority of the students to meet the 
standard requirements for Composition 4, may it not be said that one 
quarter of work in Subfreshman Composition develops good subfresh- 
man students but fails to develop good Composition 4 students? If 
this is true, and if there is a suspicion that standards are lowered in 
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Composition 4 classes made up largely of subfreshman graduates, 
might it not be wise to offer a special two- or three-quarter course for 
students now assigned to Subfreshman Composition? This course 
should give the student his required nine creditsin English. Definitely 
superior students might be allowed to transfer from this course (which 
might be called Composition 1-2-3) to Composition 4-5-6 at the end 
of any quarter. 

2. If some such plan as the above is not adopted, it seems advisable 
that the “qualifying tests” for admission to Composition 4 from Sub- 
freshman Composition should be determined, and it seems obvious 
that these must represent requirements somewhat different from those 
used for admission to Composition 4 directly from high school. 





THE EFFECT OF PRACTICE ON GROUPS OF 
DIFFERENT INITIAL ABILITY 


HERBERT WOODROW 
University of Illinois 


Practice curves have been obtained in two separate experiments. 
In one of these, fifty-six university students completed thirty-nine 
practice periods of ten minutes in each of seven tests. In the other, 
a group of eighty-two subjects, also university students, completed 
sixty-six practice periods in each of four tests. The data are being 
studied in a number of ways, but the present discussion is limited to 
the question of the effect of practice upon subgroups when the sub- 
groups are obtained by sectioning the total group according to initial 
ability. Predictions concerning the behavior of groups are more 
accurate than those concerning individuals; and from the practical 
viewpoint it would seem desirable to know what to expect after practice 
on the part of two groups, one of which was composed of individuals 
making high initial scores and the other composed of individuals 
making low initial scores. This paper will be devoted merely to a 
survey of the facts obtained with respect to this matter, and to a 
consideration of their exact meaning. The chief point of interest 
appears to be the convergence or divergence of the subgroups with 
practice. It will be indicated that the most illuminating statement 
of the facts is in terms of regression coefficients, and that scrutiny of 
these coefficients affords some suggestion as to the explanation of the 
facts. 

Of the two sets of subgroups which were studied, one was obtained 
by sectioning the total group of fifty-six and the other by sectioning the 
total group of eighty-two, both total groups being sectioned into five 
subgroups on the basis of initial scores. These two sets of subgroups 
were studied independently. The number of subjects included in the 
first set of subgroups, in order from the initially best to the initially 
worst, was as follows: seven, fourteen, fourteen, fourteen, seven; total 
fifty-six. The numbers in each of the second set were: eleven, twenty, 
twenty, twenty, eleven; total eighty-two. The reason a larger number 
of subjects was included in the three middle groups than in the extreme 
ones, is that such a selection tends to bring about greater uniformity 
in the spacing of the mean scores of the groups. If groups of constant 
size were used, the middle-most groups would differ from each other 
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much less than the extreme groups on account of the massing of the 
scores near their central tendency. 

The degree of divergence or convergence of the subgroups in the 
case of any one performance may be measured by the ratio of the 
standard deviation of the group means at the end of practice, termed 
gr’, to the initial standard deviation of the group means, termed o; 
(F standing for final and J for initial). Divergence will, then, be indi- 
cated by a value over one, and convergence by a value less than one, 
of the ratio op-/ov. The relation between this ratio and the con- 
vergence or divergence of the subgroups is illustrated by Fig. 1 (raw 
scores), which shows the results for horizontal adding in the case of the 
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Fig. 1.—Showing effect of practice upon the dispersion of subgroups in horizontal 


adding. 


subgroups composing the total group of fifty-six. The initial score of 
each subject was taken as the average score made in five ten-minute 
trials, one per day, and the final score as the average of the last three 
scores; 7.e., those made on the thirty-seventh, thirty-eighth, and 
thirty-ninth days of practice. The score consisted in the number of 
correct digits written in the correct place. The Spearman-Brown 
reliability coefficient for initial score was .938 and for final score .976. 
The means of the initial five subgroups, selected for initial ability, 
were 69.1, 52.8, 46.2, 39.3, and 31.4; and the final score means for the 
same groups (composed, of course, of the identical individuals) were 
117.3, 93.3, 86.9, 72.3, and 61.9. The ratio cy-/c, is, therefore, equal 
to 18.98 (7.e., op) divided by 12.82 (¢.e., o1) or 1.48. Since this ratio 





1A detailed description of all the tests here used may be found in an article 
by the writer entitled ‘‘The Relation between Abilities and Improvement with 
Practice,” J. Educ. Psychol., 1938, pp. 215-230. 
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is greater than one, the subgroups diverge with practice. In other I 
words, the initially better subgroups improved in score more than the c 
initially inferior groups. This fact is illustrated by Fig. 1. s 

The values obtained for the ratios or-/o, in the eleven cases studied 0 
are shown in Table I. It is clear from this table that the behavior of 1 
subgroups as regards divergence or convergence depends upon the r 


test-performance which is practiced. In the case of the majority of : 
the tests, the subgroups show convergence with practice, but the ‘ 
amount of this convergence (indicated by the degree by which o-,/cr is ¢ 
less than one) varies greatly, and in the case of several of the tests, \ 
notably horizontal adding, the subgroups pull farther apart with t 
practice. e 


TaBLE I.—CoOMPARISON OF CHANGE IN DISPERSION OF SuBGROUPS (¢F’/c,’) 
WITH REGRESSION COEFFICIENT (677) 


























yer Test or’ /oy' bry TIF or/or 
t] 
56 Anagrams .93 91 .82 1.11 
56 Substitution (digit-letter) .98 .92 .59 1.57 
82 Substitution (digit-letter) .97 .93 .57 1.64 
56 Spot-pattern (modified) .40 34 .59 .57 ir 
56 Horizontal adding 1.48 1.40 .74 1.88 I 
82 Horizontal adding 1.89 1.83 .73 2.50 
56 Cancellation (multiple instruc- 
tion) 1.08 1.10 .59 1.87 
82 Cancellation two-digit 1.15 1.14 75 1.51 ; 
82 Cancellation four-digit 1.25 1.28 .75 1.70 ID 
56 Speed (gates) 52 56 .57 .99 of 
56 Relative length .55 .52 .53 .98 
is 
What are the determining conditions of this variation in the behav- (2 
ior of the subgroups? The answer to this question must at present cc 
remain somewhat speculative; but it should be helpful to observe that tk 
it depends primarily upon the same magnitudes which determine the as 
coefficient of regression of the final individual scores over the initial be 
scores. The formula for the regression coefficient may be written in m 
various ways, but, from the viewpoint of the psychology of practice, be 
the most illuminating way of conceiving it is to regard it as the product a 
of the following two measures: First, the correlation between the initial 
and final individual scores of the total group, 7.e., r;7; and second, the on 


effect of practice upon individual differences in the total group, Pp 








Effect of Practy 271 










measured by the ratio or/o;, in 
deviation of the individual scor 
spondence between the beh 
op /ov and the regressio 


v bere refers to the standard 
of the total group. The close corre- 
ring Of the subgroups as measured by 
coeticient, by;, is sufficiently clear from 
Tablel. Whiie the regy# sion evefficients vary over a range of approxi- 
mately one hundred 9fi &ity points (from .34 to 1.83), in no case is the 
difference between/’.|.« regression coefficient aud the corresponding 
value of cron cute than eight points. And the coefficient of 
correlation between the two sets of values, be; and op-/av is over +.99. 
With larger sv hgroups the correspondence should be even closer, since 
then the final) mean score should more exactly equal the regression 
estimate theyeof and, if the final subgroup means exactly equalled the 
regression ‘4timate, then by; would exactly agree with op:/or. This 
relation is ¢lear from the following formulae: 


Or 


P = a” x I 
therefore, 
F’ 
¢ T’ = bos 


in which F’ equals the regression estimate of a subgroup final mean and 
I’ the initiai score of the same subgroup. Therefore, 
’ 
= = ba 
” 
in which ant equals the standard deviation of the regression estimates 
of the subgroup final means. 

It is clear, then, that the divergence or convergence of subgroups 
is a matter of (1) the correlation between initial and final score, and 
(2) the effect of practice upon individual differences. l’rom these 
considerations it is equally clear that the change in the separation of 
the best and worst group, or change in the ratio of their mean scores, 
as, for example, the change in the ratio of the means of the four initially 
best subjects to the four initially worst,! should not be used as a 
measure of tne effect of practice upon individual differences. The 
behavior of subgroups depends not only upon the effect of practice 
upon individual differences, but upon the correlation between initial 





1 For a discussion of this matter, see Reed, H. B.: ‘‘The Influence of Training 


on Changes in Variability in Achievement.”’ Psychol. Monog., 1931, Vol. XLI, 
pp. 14ff. 
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and final scores; in short, upon the regression coefficient. It is true 
that, if the regression coefficient is over one, individual differences as 
measured by the cas must have increased with practice, since the 
correlation between initial and final scores is sure to be less than +1.0; 
but the converse is not true, and in no case does the regression coeffi- 
cient give even an approximately reliable indication of the change in 
individual differences. 

When the reliability of the tests is low, the correlation between 
initial and final score tends to be low. In this case, the regression 
coefficient tends to be less than one, subgroups appear spuriously to 
draw together with practice, and the correlation between initial score 
and gain tends to be spuriously negative. In the present instance, 
however, regression is not to any marked degree spurious, since the 
reliability of the scores used in calculating the values of Table I was 
very high, averaging +.94 for both initial and final scores in the case 
of the group of fifty-six and +.95 for both initial and final scores in 
the case of the group of eighty-two. The reliability was controlled 
by the number of pooled days on which the score was based, and in the 
case of the group of eighty-two subjects, each initial and each final 
score had a reliability coefficient between +.94 and +.96. None of 
the regression coefficients, consequently, would be raised more than 
a few points if corrected for attenuation. The regression coefficients 
given in Table I, therefore, represent real changes due to practice, 
reflected first in change in the variability of the total group, and, 


second, in correlations between initial and final score which fall con- 


siderably below perfect because of reliable differences in the extent to 
which the individuals in the total group change their score with 
practice. 

Incidentally, the regression coefficient gives some indication of the 
correlation between initial score and gain, and the latter may be calcu- 
lated from the data contained in the regression coefficient by the 
formula rig = (br: — 1)(01/o¢) in which o¢ = Vor? + 6)? — 2orosrir. 
Since the value o:/c¢ is always positive, the correlation between initial 
score and gain will be negative whenever the regression coefficient is 
less than one. 

To explain the effect of practice on subgroups of different initial 
ability, then, it is necessary to account for two important phenomena 
of practice: (1) The change in individual differences as measured by 
the change in the o of the distribution of the scores of the total group; 
and (2) the occurrence of a correlation between initial and final score 
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of lessthanone. The effect of unreliability of scores on the correlation 
coefficient is well understood and need not be further discussed. 
Assuming perfectly reliable scores, what factors account for the varia- 
tion between test-performances in the values entering into the regres- 
sion coefficient? 

One factor which greatly affects the apparent change with practice 
in individual differences is the nature of the units in which the prac- 
ticed performance is measured. The same performance may be 
scored in various units; for example, in terms of amount done per unit 
time or time required per unit work. But no matter in what units 
the scores are expressed, it is seldom, if ever, possible merely by 
inspecting the raw score units to determine whether score units at 
different parts of the scale represent equal amounts of ability. Scale 
units can be made proportional to ability units only by absolute 
scaling, which involves a hypothesis concerning the distribution of 
ability. It has been found in the case of four of the test-performances 
practiced by the group of fifty-six subjects that the data do not con- 
flict with the hypothesis of an normal distribution of ability at each 
of the various stages of practice. The data for these four tests have, 
accordingly, been subjected to Thurstone’s method of absolute scaling. 
The method by which this was accomplished has been described in 
detail elsewhere.! By this method the results obtained remain the 
same irrespective of the nature of the raw scores. It makes no 
difference whether amount done or time scores or error scores are used. 
The raw scores are eventually thrown away, and in their place are 
substituted z-value scores, that is, scores dependent simply on the 
proportions of the total population which attain each of the observed 
raw scores. The proportion of the population failing to pass or 
exceed any given score is regarded as a proportion of a normal distri- 
bution area. The z-value of that score, then, is the distance on the 
base-line of a normal distribution area, measured from the mean 
of the distribution to the point below which falls the observed pro- 
portion. Thus the z-value of a score, below which fall seventy-five 
per cent of the population, is +.6745. If the population remains 
normally distributed in ability at two different stages of practice, the 
line of relation between the two sets of z-values will be a straight line. 
There is a check, then, on the correctness of the hypothesis on which 
the scaling is based, a feature, be it noted, which is provided by no 





1 Woodrow, H.: ‘‘ Absolute Scaling of Practice Data.’”’ Psychometrika, vol. II 
1937, pp. 237-247. 
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other method of absolute scaling. The four tests, the scores for which 
met the criterion of a linear relation between their z-values and which 
were accordingly scaled, were the following: Anagrams; digit-letter 
substitution; a modified spot-pattern test; and horizontal adding.! 
In terms of raw scores, one of these tests, horizontal adding, showed 
marked divergence; another, the spot-pattern test, the scores for 
which were error scores, showed very marked convergence; while the 
remaining two, substitution and anagrams, yielded curves for the various 
subgroups which ran almost parallel. In terms of the scaled scores, 
however, all four tests showed moderate convergence. The marked 
effect upon the apparent behavior of the subgroups due to the use of 
scaled scores instead of raw scores is illustrated in Figs. 1-4. 

Absolute scaling does not change the significance of the regression 
coefficient as regards the effect of practice on the subgroups. The 
values or /or are still approximately equal to by;, as shown by the 
following tabulation (Table IT). 


TaBLeE II.—VAaLveEs OF by; AND cor’/o;' FOR SCALED ScoREs 








Test or’/oy’ bri TIF or/o; 
a a's a onaele Sai biked asia se a 75 71 .82 . 87 
eo ete oul med ach iie eek . 84 .77 .62 1.25 
ck cas Rika k a eK ba ak dm .80 .76 .61 1.24 
UN. oo ccs cccsccesaccecns .58 .51 .72 .70 

















As may be seen by comparing the values given in Table II with 
those in Table I, absolute scaling had no significant effect on the 
correlation between initial and final scores. It exerts its profound 
effect upon by;, and upon the convergence or divergence of groups, 
almost entirely because of its effect upon the change with practice in 
individual differences (measured by o;/o:). With absolute scaling 
the regression coefficient is in all four cases less than one, and the 
initially best group shows less improvement than the initially worst. 

The data offer no support to any hope that absolute scaling will 
show the same change as regards the effect of practice on individual 
differences in a given group of subjects in the case of different tests. 
It is true that in the present instance absolute scaling shows that, in all 
the tests scaled, practice had the result of bringing the subgroups 





1 For a detailed description of these tests, see Woodrow, H.: ‘‘The Relation 
of Abilities to Improvement with Practice,” J. Educ. Psychol., 1938, pp. 215-230. 
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Fic. 2.—Effect upon subgroup means of practice in anagrams. 


wo 60 
uJ 


S 50 
a 
= 40 


< 
< 30 


\\ 








N 


0 10 20 30 40 
DAYS 





SCALED SCORES 


W 


; 





-20 i 1 t } 





0 10 20 30 40 
DAYS 


Fig. 3.—Effect upon subgroup means of practice in digit-letter substitution. 
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Fic. 4.—Effect upon subgroup means of practice in a spot-pattern test. 
scores are error scores. 
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closer together in ability. In two of these tests, however, substitution 
and spot-pattern, individual differences increased with practice. It 
is not difficult to understand this situation, though a full explanation 
would require a detailed account of all the factors affecting both 
correlation between initial and final score and the effect of practice on 
individual differences. Ewert! lists the following factors: (a) The 
criteria for learning, (b) the method of computing the data, (c) the 
units of measurement chosen, (d) the extent of practice, (e) the type 
of material under consideration, and (f) the degree of motivation. 

Very important would seem to be the extent of individual differ- 
ences in practice prior to the experiment in either the test practiced or 
similar tests. The result of this pre-experimental practice upon the 
effect of that given during the experiment should depend upon the 
shape of the learning curves of the subjects for the period covered by 
the experiment. If these curves were negatively accelerated, then 
those subjects who, preceding the experiment, had had the most 
vractice, and who, therefore, on the average should make a relatively 
high initial score, would improve less, on the average, than those who 
had had little pre-experimental practice and who, therefore, would 
tend to average lower initially than those with much pre-experimental 
practice. It would follow in such a case that individual differences 
should decrease with practice, providing, of course, that the scores 
used varied proportionally to ability. It is plausible that these con- 
siderations largely account for the marked decrease in individual 
differences with practice in horizontal adding, since it is known that 
some of the subjects were engineering students constantly working 
with numbers while others were taking exclusively liberal arts courses 
which never required mathematical computations. 

On the other hand, when the mean practice curves of all subgroups 
are negatively accelerated and individual differences nevertheless 
increase with practice, as in digit-letter substitution and spot-pattern, 
even when scored in absolute units, some other explanation is called 
for. It seems desirable, therefore, to point to another source of 
changes with practice in individual differences which has not hitherto 
been given the attention it deserves. The cause of these changes lies 
largely in the fact that improvement with practice is accompanied by, 
and probably in large part depends upon, a change in the way in which 





1 Ewert, H.: ‘‘The Effect of Practice on Individual Differences when Studied 
with Measurements Weighted for Difficulty.”” J. Gen. Psychol., 1934, Vol. X, 
p. 253. 
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the subject performs the task. The task remains the same only in a 
blue-print sense; that is, the test-blank and the specifications given 
by the experimenter remain constant, and, to a considerable degree, 
the general nature of the accomplishment resulting from the operations 
of the subject. But the pattern of these operations whereby the task 
is accomplished changes markedly in the course of practice. Now a 
factor analysis which has been made of the data here discussed! has 
shown that final scores are dependent upon a different pattern of 
abilities than initial scores. To say that the way in which the task 
is performed changes with practice is to say much the same thing as 
that there is a change with practice in the pattern of abilities which 
determine the score. Now suppose, for example, that final perform- 
ance depended mainly on ability Y and initial ability mainly on 
ability X, and that individual differences were greater in ability Y 
than in ability X—then individual differences would tend to increase 
with practice; and vice versa. 

The change with practice in the pattern of abilities brought into 
play in the execution of the task also could account for the falling-off 
with practice in the correlation coefficient, since the abilities here 
supposed are either uncorrelated or correlated only to a small degree. 
An analysis by the centroid method gives an account of the scores in 
terms of totally uncorrelated abilities. And if only orthogonal trans- 
formations of the axes immediately given are made, as was the case in 
the factor-analysis of the present data, the abilities represented by the 
various axes remain totally uncorrelated. Now when the scores are 
accounted for in terms of uncorrelated abilities, it is obvious that if 
initial scores were altogether dependent upon ability X and final scores 
altogether dependent upon ability Y, the correlation between initial 
and final score would be zero; and further, that any considerable change 
in the degree to which the scores depended upon the two uncorrelated 
abilities would result in a correlation between initial and final score 
markedly below a perfect correlation. The greater the change in the 
ability pattern, when the abilities are uncorrelated, the lower should 
be the correlation between initial and final score. 

In conclusion, it is desired to emphasize that the phenomena here 
discussed form an important aspect of the results of practice and learn- 
ing. In recent years, experimental psychologists, working largely 
with the white rat, have made notable progress in the explanation of 





1 Woodrow, H.: Opus. cit. 
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learning. Applied to practice, however, the theories advanced are 
helpful mainly in understanding simply the fact that performance 
improves with practice, 7.e., that the practice curves rise. Before 
any account of the phenomena of practice and learning can be com- 
plete, however, | it must take account of two other important phe- 
nomena ;r « te Se change in individual differences and the falling-off 


, ~~ in the comslation between the first obtained and the last 
t 


’ ytained ecores 


SUMMARY 


The degree of convergence or divergence of groups selected accord- 
ing to initial ability is primarily a function of the regression coefficient. 
From the point of view of the psychology of practice, the regression 
coefficient may in turn best be regarded as the product of two values, 
the first of which, oy/o; is an index of the effect of practice on individual 
differences and the second, rr, is the correlation between initial and 
final scores. The effect of the nature of units upon these values, and, 
therefore, upon the convergence or divergence of subgroups selected on 
the basis of initial ability, is shown by a comparison of the results 
obtained by the use of raw scores with those resulting from an applica- 
tion of Thurstone’s method of absolute scaling. With absolute scaling 
it was found that subgroups resulting from the sectioning of a larger 
group on the basis of initial score drew closer together with practice 
in all four of the tests scaled. Even with absolute scaling, however, 
two of the tests showed an increase in individual differences with prac- 
tice. Explanatory hypotheses were advanced to account for both the 
changes in individual differences and the falling-off with practice in 
the correlation between initial and final score—the two factors which 
taken together determine the effect of practice upon the convergence 
or divergence of subgroups of different initial ability. 








A STUDY OF THE MALLER AND BOYNTON 
PERSONALITY INVENTORIES! 


D. B. HARRIS AND D. H. DABELSTEIN 


University of Minnesota 


Within recent years psychologists and educators have been devoting 
considerable attention to the nature of character and personality traits 
and to the possibility of measuring such traits or qualities objectively. 
Two paper and pencil tests have appeared recently which attempt to 
investigate pupil adjustment in a different manner from that of the older 
psychoneurotic inventories. One of these tests, the Case Inventory 
by J. B. Maller, consists of a battery of four subtests and appears in 
equivalent forms, A and B, suitable for the purpose of test and retest. 


In his Manual of Directions,? Maller designates the four subtests as 
follows: 


(1) Controlled association test for the measurement of rationality (Fifty 
items selected from a larger test of the same nature.) 
(2) Adjustment test—a self-description inventory of personal and social 


adjustment. (Fifty items selected from the author’s previous work with 
“character sketches. ’’) 


(3) Self-scoring test for the measurement of honesty in classroom situa- 


tions. (Fifteen items from a test of Sports and Hobbies used to examine 
overstatement.) 


(4) Ethical judgment test—problems of moral conflict and a self-evalua- 
tion in respect to ethical standards (nine items). 


Maller states that items included in both forms of the Case Inven- 
tory have been selected on the basis of demonstrated validity and 
reliability. He states that the correlation of Forms A and B on two 
hundred forty-eight cases gave a reliability coefficient for the total test 
score of +.936 and coefficients between +.901 and +.962 for the 
various subtests. As far as the writers know, further data on reliability 
and validity are not at present in print. 

The B.P.C. Personal Inventory developed by Paul L. Boynton 
consists of a single page of forty-two items significant for social and 





‘Mr. Dabelstein was responsible for gathering the data and scoring tests; 
Mr. Harris wrote the article. Both authors contributed to the statistical com- 
putations. The writers are indebted to Dr. P. M. Symonds for valuable 
criticisms. 

? Maller, J. B.: Case Inventory, Manual of Directions. New York: Teachers 
College, Columbia University. 
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emotional adjustment, stated in interrogative fashion with a “yes-no”’ 
response following each. The general form and content of the B.P.C. 
Inventory is quite similar to the Woodworth-Mathews Inventory,! 
which has been in use for a number of years. Boynton, however, 


provides four scoring keys concerning which he says in the Manual of 
Directions:? 


If one wishes to check for personality abnormalities, the key headed Per. 
should be used. If the problem is one of scholastic maladjustment, the 
key headed Sch. should be used. If the detection of abnormal behavior 
tendencies is the problem, then the key headed Con. should be used. The 
Gen. key probably should be used rather rarely. It lacks the specificity of 
meaning of the other three keys. It is for the most severe general problem 


cases, personality, scholastic, and conduct, all combined and in exaggerated 
form. 


To the writers’ knowledge, no specific data are available in print to 
indicate the reliability and validity of the B.P.C. Inventory. 

There is, of course, need for a rather extended empirical evaluation 
of a test before it is put to extensive usage. In brief, the purpose of 
this study was to make some contribution to the statistical evaluation 
of the two inventories. 

Both of the measures under consideration were administered to 
four hundred twenty-one students in the Litchfield, Minnesota, public 
schools, Grades V through IX. The Kuhlmann-Anderson group test 
of mental ability was used to obtain the mental ages of the subjects. 
In addition, the Van Wagenen Unit Scales of Attainment were adminis- 
tered to the pupils of the fifth, sixth, seventh, and eighth grades. All 
pupils through the eighth grade were residents of Litchfield itself. 
Approximately sixty per cent of the ninth-grade pupils came from the 
rural areas surrounding the town. It is the belief of the writers that 
this sampling is representative of the average small town school in 
Minnesota. 

Table I presents means and standard deviations of raw scores, by 
grades, on the subtests of the Maller Case Inventory. Means and 
standard deviations of total scores (obtained in each case by summing 
raw scores of the subtests) are also included. The differences between 
the means of succeeding grades on any of the subtests or on the total 
score are generally statistically insignificant; the differences between 





1 Mathews, Ellen: ‘‘A Study of Emotional Stability in Children.”’ Journal of 
Delinquency, Vol. VIII, 1923, pp. 1-40. 
2 Boynton, Paul: Manual of Directions. Nashville: George Peabody College. 
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TABLE I.—STATISTICAL CONSTANTS COMPUTED FROM GRADE DISTRIBUTIONS OF 
SuBTEST AND ToTAL ScoRES ON MALLER INVENTORY 



























































Subtest | Subtest | Subtest | Subtest ee ee 
I II III IV 
Grade| N 
Mean|SD)Mean/SD|Mean|SD/Mean|SD|/Mean|} SD! Q: | Md! Q; 
V 63| 37.4/5.3) 32.8/7.0) 13.0/5.8) 20.8)5.8)102.6)14.2) 92.2)103.5 112.7 
VI | 62) 38.7/5.4| 35.4/6.8) 14.8/5.5) 20.8/5.3/108.1/11.5 99.4/107.7)117.1 
VII | 69) 40.9)/5.2) 35.4/6.9) 18.6'3.3) 25.8)4.5)118.5)11.8)111.5119.4/126.2 
VIII | 62) 39.5/5.9) 39.8'5.4] 16.8 4.3 27 .2)3.9}119.3}11.7/110.2)120.9127.3 
IX /|165) 42.05.3) 38.8/5.7 16.5)5.0 26 .4)4.8/121 2/10. 7/118 .9)121 4/129. 1 





means of grades separated by two or more years are usually significant. 
A striking increase in average total score points may be noticed between 
the sixth and seventh grades. Inspection of the subtests shows that 
only the test of moral discrimination (No. IV) contributes noticeably 
to this gain in mean total score. On the whole the data indicate, if 
taken at face value, a slightly greater degree of pupil “‘adjustment”’ 
in junior high than in elementary grades. Since the present data were 
established on different populations in each grade, an immediate 
explanation is not available. One can, however, legitimately raise the 
question of the test’s relationship to mental maturity. An analysis of 
this relationship will be included shortly. 

The Pearson r coefficients of intercorrelation for several variables 
appear in Table II. The interrelationships of the various subtests of 


TABLE I].—INTERCORRELATIONS OF TEST VARIABLES FOR Four HUNDRED 
TWENTY-ONE CasEs, PROBABLE ERRORS OF INDIVIDUAL COEFFICIENTS 
VARYING FROM .033 To .005 





Variable 2;3;4;,5;6/]7;] 8) 91 10 





1. Controlled association......... . 186) . 169) .094) .602) . 482) .287 260 .353) .247 
By MN vbi6de coda sew nes ....|.028) .306) . 709) .357) .484) .572).377). 183 
3. Self-scoring...............0.- ....]....|-082).446]. 168]. 189]. 134). 158).199 
4. Ethical judgment............. weeeleeee|.-..|-624].241] 498) . 289) .367/.216 
cen ae cneeaenee seeclecscleeee]ee. «|. 445).556] 574! 523) .290 
Gy Me. GUUIOE iis ctccsweses se baa eslale 6 cates wale os ahaa 
Fe Se OER Sv sin ckncacaes ee ee ee ee on eee ULE 
8. B.P.C. personality............ ee ee ee ee oe oe 
Dh ID, gon kctcewscsrinss ey SES Ee ee ae ae Pee ee 
10. Chronological age............. 
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the Maller battery are uniformly positive but low. Correlations with 
total score are higher, but, to be sure, a certain amount of self-correla- 
tion obtains where each subtest is correlated with total score. The 
relation of all the various subtests of the Maller Inventory to mental 
age and to chronological age seems relatively slight. The relationship 
between total score and mental age expressed in terms of a Pearson 
coefficient of correlation (r = +.523) is comparable to the general 
relationship between school achievement and intelligence. With 
chronological age, the correlations are lower. The partial correlation 
technique assists in showing the real relationship of the Maller total 
score to mental and chronological maturity. The correlation of 
+.523 between mental age and Maller total score remains practically 
unchanged (+.516) when chronological age is held constant. How- 
ever, the correlation of +.290 between chronological age and Maller 
total score falls to —.274 when mental age is held constant. It 
appears, then, that there is an appreciable positive relationship between 
total score on the Maller Inventory and mental maturity as measured 
by the Kuhlmann-Anderson tests, while the relationship between 
chronological age and Maller total score is slightly negative when the 
effect of mental age is controlled. 

The relationship between mental ability and Maller total score was 
examined further by use of scores on the Van Wagenen Unit Scales of 
Attainment. The pupils in the highest and lowest sixths of the dis- 
tributions of the Maller total score for each grade were selected—those 
pupils falling outside the area cut off by plus and minus one standard 
deviation from the mean of the whole group. Table III shows the 
median chronological age, median mental age, and median educational 
age for the deviate groups in each grade. 

In any grade it is apparent that scores in the lowest sixth of the 
Maller distribution are made by pupils duller mentally and of lower 
educational attainment than pupils of average or better than average 
adjustment according to the Maller test. If total score be used as a 
unit measure of adjustment of which the four subtests measure only 
slightly related parts, allowance must be made for the effect of mental 
ability. However, at the present state of understanding of the com- 
plex phenomenon of ‘‘adjustment,’’ it is probably safer to consider the 
subtests separately. In such case, the effect of mental ability on test 
results is less significant. 

The B.P.C. Inventory was scored, as the author suggests, with the 
three keys calculated to designate separately abnormal conduct, 
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scholastic maladjustment, and personality difficulties. Table II, 
showing the intercorrelations of these three keys, raises some doubt as 
to the individuality of any one of the qualities mentioned. Correla- 
tions of +.897, +.915, and +.919 bear more resemblance to reliability 
coefficients than to coefficients of correlation between supposedly dis- 
tinct and separate qualities. It would appear from the relationships 
established in this study that one key is about as good as any other for 
measuring any or all of the types of personality difficulties mentioned 
by the author. It is possible that the test-retest reliability coefficient 
of the inventory itself would not exceed the relationship expressed by 
a coefficient of +.900. 


TaBLE IIJ.—MeEpIaN CHRONOLOGICAL, MENTAL, AND EDUCATIONAL AGES OF 
Pupits 1In Grapes V, VI, VII, anp VIII Comparep with MEDIAN AGES 
oF PupPiLs IN THE LOWEST AND HiGuHeEst SIxTHS OF THE CASE INVENTORY 
IN THE SAME GRADES 

















Lowest sixth, case inventory Average etignest or, 
case inventory 
CA 11-4 10-7 10-8 
Grade V MA 10-4 11-0 11-6 
EA 10-6 11-3 12-6 
CA 12-6 11-6 11-4 
Grade VI MA 10-10 12-4 13-8 
EA 11-8 12-8 13-0 
CA 12-10 12-9 12-8 
Grade VII MA 11-10 12-9 13-10 
EA 13-0 13-8 14-4 
CA 14-4 13-8 13-6 
Grade VIII MA 13-2 14-0 14-8 
EA 13-6 14-4 16-6 

















Coefficients of correlation between each of the Boynton keys and 
mental ability and chronological age, respectively, are low; the rela- 
tionship with chronological age being consistently lower than with 
mental age. The partial correlation technique reveals that the real 
relationship between the keys and mental age is partially obscured by 
chronological age. When chronological age is held constant, the 
correlations with ability mount as follows: 
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Conduct key: from 4-.163 to +.341 
Scholastic key: from +.349 to +.476 
Personality key: from +.365 to +.469 


When mental ability is held constant, the correlations of the various 
keys with chronological age drop as follows: 


Conduct key: from +.044 to —.316 
Scholastic key: from +.077 to —.381 
Personality key: from +.116 to —.334 


While the relationships indicated by the correlational technique are 
not high, they certainly warrant consideration in any study which 
attempts either to establish the validity of the tests in question or to 
use the tests in programs of guidance. 

The intercorrelations presented in Table II indicate positive rela- 
tionships of varying degrees of magnitude among practically all 
variables. As the table stands, it is difficult to describe the general 
relationships which obtain among the variables. The simplified 
multiple factor method developed by L. L. Thurstone! offers a means of 
accounting for the relationships observed in a correlation table in 
terms of a limited number of more or less distinct general factors 
which, it must be admitted, are statistical entities and may or may not 
be identifiable as psychologically ‘‘ unique traits.” 

A factor analysis by the Thurstone method indicates that three 
general factors, with possibly a fourth, will account pretty well for the 
relationships expressed in Table II. The loadings for each of the four 
factors derived are reproduced in Table IV. The factor analysis 
method does not, of course, supply names with which to describe 
psychologically the entities derived by the method. One must be 
extremely cautious in assigning terms so as not to commit the fallacy 
so frequently found in ‘‘trait names’’—loose, general terms attached 
to qualities not any too well isolated experimentally and defined 
psychologically. By inspecting the sign and numerical value of the 
weightings in each factor, one can discover which tests contribute 
most to the factor. All ten variables seem to contribute positively to 
Factor I, particularly the Maller total score, the Boynton keys, and 
mental age. It is possible that Factor I may be described as an 
ability or facility with words and verbal material. Controlled associa- 
tion and the Boynton keys contribute the highest loadings to Factor II, 





1 Thurstone, L. L.: A Simplified Multiple Factor Method and an Outline of the 
Computations. Chicago: University of Chicago Bookstore. 
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TaBLE IV.—Factor LoapincGs For Four Factors, TEN VARIABLES, COMPUTED 
BY THE THURSTONE SIMPLIFIED CENTER OR GRAVITY METHOD 











Factor 

Variable Sum of 
I II Ill ae eee 
1. Controlled association....... +.518 | +.445 | +.048 | —.563 . 786 
a re ree +.617 | —.057 | —.188 | +.277 .496 
S. Self-ccoring.........ccc0e0. +.319 | —.103 | —.053 | —.301 . 206 
4, Ethical judgment........... +.527 | —.190 | —.003 | +.184 .348 
DP essnncceese + .864 | —.256 | —.690 | —.187 1.322 
6. B.P.C. conduct............. +.731 | +.687 | +.040 | —.072 1.013 
7. B.P.C. scholastic........... +.815 | +.444 | +.032 | +.254 .927 
8. B.P.C. personality.......... +.799 | +.471 | +.042 | +.230 915 
cine nonnnkens +.676 | —.432 | +.450 | —.093 . 854 
10. Chronological age........... +.473 | —.515 | +.542 | —.194 .820 

ee sinh ee ed oro ke wo .429 .147 .098 .072 




















while mental age and chronological age relate negatively. Here, one 
would suspect from the nature of the tests involved that the entity can 
best be described psychologically as atypicality or irrationality of 
response in situations generally conceived to indicate “normal” 
emotional behavior. Factor III is apparently a maturity factor, with 
chronological age and mental age showing the most appreciable load- 
ings. Factor IV is not very well described, but the Boynton keys and 
the adjustment test of the Maller battery contribute to some extent. 

The self-scoring test of honest behavior contributes very little to 
any of the factors here described. It may be that this observation is 
further testimony to the conclusions of Hartshorne and May! that 
honesty is related to specific situations and does not seem to exist as a 
general factor of which different people possess varying amounts. 

In resumé, one may say that within the limits of this study the 
following appear: 

(1) Mean scores of junior-high-school grade groups on the Maller 
subtests exceed those of elementary grade groups. The increase in 
total score is more distinct from grade to grade, with the greatest gain 
appearing between Grades VI and VII. 

(2) Scores on the subtests of the Maller Inventory are positively 
correlated to mental ability and chronological age only to a slight 





1 Hartshorne, Hugh, and May, Mark: Studies in Deceit. New York: The 
Macmillan Co., 1928. 
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degree. Total scores on the Maller test are more definitely related to 
mental ability. This correlational relationship is positive, and is more 
appreciable than the relationship of total scores with chronological age, 
which is negative. 

(3) Scores on the Boynton Inventory are also related positively to 
mental ability and negatively to chronological age. The high inter- 
correlations of the keys suggest that one key could well take the place 
of the three or four which the author supplies. 

(4) A factor analysis of the data by the Thurstone simplified center 
of gravity technique indicates that three general factors, possibly four, 
will account for the relationships among the various tests of the Maller 
battery, the keys of the Boynton Inventory, mental and chronological 
age. One factor appears to be analogous to general mental ability, 
another seems best described as a matter of peculiarity of emotional 
response patterns, while a third factor is probably simply a matter of 
mental and chronological maturity. A fourth general factor is not so 
well described. 

(5) In light of the fact that differential weightings of various tests 
in any one factor occur, it is probably advisable to pay close attention 
to scores on subtests as well as to total scores. The total score as such 
seems weighted differently in the factors than the subtests. It is 
possible that attention to the total score alone might neglect facts 
indicated by scores on particular subtests. 
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MEASUREMENT OF CULTURAL KNOWLEDGE 


A. R. LAUER 


Iowa State College, Ames, Iowa 


PROBLEM 


Various arguments for the study of aesthetic subject-matter in the 
schools, as a means of raising the general cultural level of the popula- 
tion, have been advanced. Most of the premises of such arguments 
have been assumed on a priori grounds. It was the purpose of the 
present study to devise ways and means of measuring cultural knowl- 
edge and to ascertain the relationships which exist between certain 
contributory factors of cultural development. The effects of educa- 
tional influences were considered of primary importance, in as much as 
cultural values are given a great deal of weight in modern educational 
theory. 

A satisfactory definition of culture may be difficult to formulate. 
According to the dictionary meaning it is given as being nearly 
synonymous with the term enlightenment. Certainly, those who are 
conversant with topics which are the most frequent subject of ordinary 
conversation in the better circles of society are to be considered cul- 
tured. The cultured person is usually thought of as one who speaks 
and writes of the finer things of life. He may, at times, comment 
upon the weather, but most small talk about the dross and vulgar 
things of the environment is left to the indulgence of those lacking 
in general information and breadth of view. Taking as a basis the 
subject-matter of the fine arts, as the least equivocal content for such 
a test of cultural knowledge, it was the main objective of this study 
to find out the degree of mastery shown by a given school population. 


METHOD AND PROCEDURE 


A list of one hundred fifty items, including phrases, terms, descrip- 
tive words, names of men, and terminologies peculiar to the fine arts 
and literature were selected by consultation with artists, musicians, 
painters, critics, and reference to dictionaries, and other sources of 
artistic nomenclature. This was organized into a multiple-choice 
type of test. Each item was followed by the initials of each of the 
seven categories chosen to be represented in the test; viz., painting, 
architecture, sculpture, music, drama, literature, and non-artistic. 
Thus each test item had seven possible answers which might be given, 
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reducing the error of chance marking to one in seven. The items were 
arranged in two columns on each page and mimeographed, each item 
appearing as follows; 


1. moderato PASMDLN 


The line at the right was used for marking or writing the correct letter 
or initial of the proper answer, 7.e., the category to which it belonged. 
This makes the test much easier to score than the method used by 
Strong and others for marking attitude tests. 

Three parallel forms of the test were constructed by substituting 
nearly equivalent words in the three. The order of corresponding 
terms or items in Forms 1 and 2 were the same while the order of items 
in Form 3 was reversed, 7.e., the first item in Form 3 was matched 
with the last in Form 2, etc. The categories used were almost equally 
represented in each of the forms. In music such names as Beethoven, 
Mozart, and Handel were used as parallel items. In painting names 
like Rembrandt, Millet, and Bonheur were used interchangeably. 
Again, in architecture such words as lattice, cornice and cupola were 
coordinated. That the equality of difficulty of items was quite well 
matched may be seen from the relatively high reliability coefficients 
obtained between odds versus even items of the same test and the 
correlation between two forms of the test. The reliability was found 
to be +.91 when corrected for length of test. About five hundred 
preliminary tests were run to evaluate the procedure before the regular 
experimental series were given. 

For the experimental series, a total of two thousand eighty-nine 
complete records were obtained from high-school, college, and univer- 
sity students in the following states; California, Colorado, Iowa, Illi- 
nois, Kansas, Maryland, Minnesota, Mississippi, Montana, Nebraska, 
New York, Oregon, Ohio, and Wisconsin. The subjects listed 
addresses from practically every state in the Union. The scores were 
punched on Hollerith Cards, and an item analysis made of the three 
forms of the test. Only certain results obtained will be presented 
in this paper. 





RESULTS 


About one hundred seventy-one cases were studied in the pre- 
liminary experiments during which relationships between the variables 
were calculated. Data relating to certain educational advantages 
and other personal facts were requested of examinees on the first 
page of the test. The correlations of these variables is shown in 
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Table I. The relationships between intelligence and cultural knowl- 


edge and between college average and cultural knowledge are to be 
noted particularly. 


TaBLE I.—INTERRELATIONS BETWEEN CULTURAL KNOWLEDGE AND OTHER 











VARIABLES! 
1 2 3 4 5 6 
i 6 & ahaa Seicnhacse 66 4 
IR is 5 tana owesise 11 
3. College average........... .08 .36 
4, Cultural knowledge........ — .09 47 .38 
5. Esthetic training.......... — .04 .05 .04 31 
6. Field of interest........... — .06 .16 — .02 .18 — .02 
el) eee — .08 .22 .04 31 .12 .12 




















1 Reproduced in modified form, from abstract of paper given before Iowa 


Academy of Science and published in proceedings, for convenience of readers. 
Publication not easily available. 


By multiple correlation technique, the associated variables in order 
of merit are found to be: (1) intelligence, (2) three-quarters-grade 
average, (3) training in the fine arts, (4) size of residence town, (5) 
age, and (6) field of special interest. The slightly higher relationship 
between interest test results and the average of three quarters grades 
in college suggests that such tests may be more valuable as prognostic 
instruments then intelligence tests as such. 

An item analysis of the three forms was made. Some typical 
results are listed below in the following tables. The number right 
varied from the limits of zero to ninety-five per cent. The distribution 
curve of the percentage correct for each test was fairly symmetrical. 
It approximates the normal curve. In most cases women were superior 
to men, both with respect to means and individual items. Reversals 
of this were relatively uncommon although a few instances may be 
noted. The following represent some of the items in which men 
were superior to women out of the total of three hundred items in 
two tests. They represent those which are likely significant. 


Upper QUARTILE LowER QUARTILE 
James Russell Lowell Rameses 
As You Like It Keats 
stretto C. H. Judd 
Guarnerius 
stringendo 


Dalu 





290 The Journal of Educational Psychology 


Analysis of nationality differences show that students of American 
and English stock are closely contested by Orientals for top place. 
Those of French, German, and Jewish extraction are about average. 
Of the samplings secured the South Europeans and Scandinavians 
were below average with Negroes at the bottom of the list. 


TaBLE IJ.—NATIONALITY DIFFERENCES 











‘ : Number | Mean number 
Nationality or race ; 

of cases right 
rr en. cee endbwesebedaaeee ean 576 80.8 
EO EE err T 145 78.8 
ee a ke ne hs akg ia ae oo win pees 228 77.6 
ag a i Os oa Cl ae a ed wale 39 76.5 
od hig Cae ed eee uid wks Ceca ed 230 76.0 
ee et eS cng ko hs A a i eh 6 an tical 417 75.9 
ie sin ah hind ee ekbene neh 46 onc 84 70.9 
i ee ke wie Wea ae ee 114 66.8 
CR a ee 191 65.6 
ete eee Ll ee the cnn eu bhi he a 65 59.8 

Rac atin eats ahaa Oe Se ee haa ae 2089 











Some very interesting results were. noted in relation to training in 
artistic fields. With special study in the fine arts up to about one 
hundred eighty months, or fifteen years, the general knowledge of 
artistic subject-matter increases. Since there is a parallel in the 
training of different arts it seems that this corresponds in general to 
high-school and college age. At two hundred sixty months there 
seemed to be a drop in the curve until about three hundred forty 
months, after which the upward trend was again resumed. This 
would correspond to the late college and graduate years of study. 
The indication is that a certain saturation point, resulting in a plateau, 
is experienced during college age even for the students who major in 
the fine arts. 

Further comparison of the scores was made in relation to college 
years. A flattened place on the curve appeared during the sophomore 
and junior years. This plateau is characteristic of the composite curve 
including both sexes. When men and women were treated separately, 
differential results were obtained. The women seem to improve 
throughout the college course while men showed a slump during the 
sophomore and late junior years. The inference is that men cease 
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to improve their general knowledge of cultural subjects during the 
college course but again begin the serious consideration of such matters 
after graduating from college. 

A comparison of the different types of subject-matter covered in 
the test was made. In America, certain arts are much more popular 
than others. This is obvious and needs no further elaboration. Music 
and literature are taught more or less universally. Painting and 
architecture, second to sculpture, would be expected to be the back- 
ward-sister arts. The relative standing of the arts as determined in 
this study is shown in Table III. 

While the categories of the different forms seem to be somewhat 
unequal in difficulty, the means of the whole tests run fairly consistent 
as shown in the summary of Table IV given below. 


TABLE I1V.—SuMMARY OF PERCENTAGE MISSED 














Men Women 
‘ i M 
Fresh Junior | Gradu- Fresh Junior | Gradu- — 
man and man and of all 
‘nal and ate h and ate 
OPpno- | senior |students| ~°P"° | senior |students! 
more more 
| 51.1 54.7 42.5 50.3 41.7 44.9 47.5 
a 53.9 54.6 41.9 50.2 52.3 19.4 45.4 
Ye 49.3 47.8 45.7 48.3 42.2 29.6 43.8 
Mean of all..... 51.4 52.4 43.4 49.6 45.4 31.3 























1 There are fewer cases in this category. Summary means are not weighted. 


A differential study of the items missed based on all two thousand 
eighty-nine cases in the upper and lower ranges of scores revealed a 
close relationship between the items missed by men and women. 
The extreme items of Form 1 and Form 2 are shown in Table V. 

The first group of least missed items shows a seventy per cent cor- 


respondence. 


variable and show only a 52.4 per cent correspondence. 


The most frequently missed items are slightly more 


Considering 


the possibility of variation from chance at the extremes, and the 
relatively small range of fluctuations for each group, this seems quite 
significant. 

From Table III it is shown that the familiarity with artistic sub- 
ject-matter of college students of the type measured here ranges from 
66.7 per cent for music to 30.6 per cent for sculpture. 


Next to 
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music, with respect to general knowledge of subject-matter, come 
literature, architecture, non-artistic or scientific, painting, and drama 
in descending order of merit. 


SUMMARY AND CONCLUSIONS 


A study of cultural information carried on in fourteen states 
involving over two thousand high-school, college, and graduate 
students showed a gradual growth in cultural knowledge during and 
after the period of formal education with the exception of male students 
from sophomore to senior level, when a slight regression occurred. 
Women seem to be slightly superior to men on most items at all levels 
of educational accomplishment. 

Persons of English and American stock show superior ratings on 
the tests used. The results parallel, to some extent, the results 
of other investigators on intelligence tests when nationalities are 
compared. 

Analysis of items of the three forms of the test showed some 
variation in difficulty of the categories used in the separate forms, but 
when composited certain subject-matter was shown to be much better 
known than others. 

The agreement of widely separated geographical groups was quite 
close on certain of the items used. American culture seems quite 
homogenous regardless of the geographical expanse of territory. 

The best known subject-matter was music and literature. Drama 
and sculpture are least known by the subjects studied. Since women 
gain consistently throughout the college course and men suffer a slump 
during this time, the question is raised as to whether college courses 
for men are actually cultural or are they largely vocational in nature? 
While the question may be of relatively little importance, it does 
suggest a review of the objectives of the college curriculum with a view 
toward revision of methods or of changes in the courses proper. 
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THE MEASUREMENT OF A PERSONALITY TRAIT 


FRANCES SWINEFORD 


University of Chicago 


Many personality traits are of necessity measured by means 
of questionnaires. No other methods are at present available. Such 
means are not desirable because the test-wise child seeks the answer 
which he thinks will give him a high rating, and thereby may stray 
somewhat from actual facts. Moreover, even the most conscientious 
individual may possess a personal bias which is reflected in his answers. 
Recently, however, Wiley and Trimble! reported an experiment which 
gave evidence of the existence of a personality trait that might be 
measured in more objective fashion. They administered to fifty-nine 
students four tests, each with instructions that the student should 
indicate after each response whether he was sure, in doubt, or guessing, 
with the understanding that the response would be weighted according 
to the degree of certainty with which it had been made. Actually, 
however, the scores were not weighted, but were obtained by the usual 
formula, R-W. The number of items marked “‘sure,”’ ‘‘doubt,” and 
‘“‘suess’’ were then recorded for each test. The six intercorrelations 
among the four tests were computed for each of these measures, and 
have been summarized in TableI. Each figure in the table is the mean 
of the six corresponding correlations. 


TaBLE I.—MEaANS OF CORRELATIONS AMONG Four TEstTs 
Wiley and Trimble Data 








Ries Number of Number of Number of 
Cee items marked | items marked | items marked 
R-W rT ” rT ” “6 ” 
sure doubt guess 
Mean of six correlations....| .393 .656 .576 . 566 

















The values of the table indicate that the tests measured the 
students’ apparent confidence in their responses more consistently 


than knowledge of subject-matter. 


in the words of the authors: 





The implications may be stated 


1 Wiley, Llewellyn N. and Trimble, Otis C.: ‘‘The Ordinary Objective Test as a 


Possible Criterion of Certain Personality Traits.” 


XLIII, March 28, 1936, pp. 446-448. 
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If in responding to the different items of an objective test, and then mark- 
ing the individual responses for ‘‘certainty,” “‘doubt,” ‘‘guess” and “unwill- 
ingness (to make a choice)”’ any constant factor, or factors, is revealed, then 
it may be assumed that some personality trait, or traits, is operative or has 
been revealed. 


The same principle of classifying test-item responses has been 
employed by another investigator,! who gave a seventy-five-item test 
to sixty-two college students. His instructions follow: 


Mark a statement + if you judge it to be true and 0 if false. You may 
claim credit up to 4 points for each of your responses if you wish. Before 
your responses encircle 4, 3, or 2, depending on the credit you want. If 
your answer is wrong, the penalty will be double the amount of credit you claim. 
(It is advisable to claim 4 credits if you are sure your answer is correct.) 

If you claim no special credit as described above, you should, nevertheless, 
answer all questions, even if you have to guess. All such answers will be scored 
in the ordinary way, right-minus-wrong. 





Each question was then preceded by ‘‘4—3—2 .’ The heavy 
penalty discouraged the student from marking all his responses ‘‘4” 
on the chance that he was a lucky guesser. 

The papers were scored both in the ordinary way, R-W, and by the 
weighting method. Those responses for which no credit was requested 
were not included in the weighted scores. The reliability coefficients 
for the two methods were .72 + .04 and .85 + .02, respectively. The 
author emphasized the fact that weighting the responses increased 
the reliability even though the weighted scores were based upon fewer 
items. By recommending that test scores be weighted in this manner, 
he was assuming that the student asked for credit in accordance with 
his certainty that the answer was correct. 

If Wiley and Trimble are correct in postulating that a personality 
trait is operative in the marking of individual responses, then Soder- 
quist’s weighted scores may very well appear more reliable because 
they measure both knowledge and this trait. In such event, he is 
not justified in using the weighted score as a measure of only knowledge 
of subject-matter. 

This type of experiment has been repeated by the writer in order to 
seek a measure of the personality trait, which will hereafter be referred 
to as the tendency to gamble. 





1 Soderquist, Harold O.: ‘‘A New Method of Weighting Scores in a True-false 
Test.”’ Journal of Educational Research, Vol. XXX, December, 1936, pp. 290-292. 
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A seventy-five-item true-false test was administered to one hundred 
sixty college students.!_ The instructions to the student were the same 
as those used by Soderquist. All reliability coefficients were computed 
for split halves (odd vs. even items) and stepped up by the Spearman- 
Brown formula. The unweighted scores were found to have a relia- 
bility of .5909, while the weighted scores, omitting all items for which 
no credit was claimed, yielded a coefficient of .6802. These are the 
values which correspond to those reported by Soderquist. 

Now it is apparent that each weighted score is based upon a 
different set of items, since no two individuals claimed identical 
credits throughout the test. The scores, therefore, are not strictly 
comparable. Furthermore, the range in score on the variably selected 
sets of items is greater than that for the original test, because the 
number of items for which some credit was claimed ranged from 
none to seventy-five. This increase in range of score can alone be 
responsible for an increase in the reliability coefficient. For purposes 
of illustration, eight scores have been computed for each student: 
Unweighted and weighted scores for those items which he marked 
“4.” his “‘4’s”’ and ‘‘3’s,”’ his “‘4’s,”’ ‘‘3’s,”’ and “‘2’s,”’ and the complete 
test. Table II gives the reliability coefficients, mean number of items 
attempted, and mean score for each of these classifications. 


TaBLE II.—RELIABILITY COEFFICIENTS AND MEAN ScorES FOR ONE HUNDRED 
Sixty CasEs 

















Reliability Mean score 
Mean number 
Set of , 
items Un- , of itome Un- ; 
weighted Weighted attempted weighted Weighted 
ee .815 + .02).698 + .03 46.4 30.24 88.68 
Ae - .777 + .02|.673 + .03 56.8 33.61 88 .24 
4+3+2...... .685 + .03).680 + .03 67.2 35.91 84.74 
Total test....... .591 + .03).675 + .03 74.2 37.72 86 .56 

















If the reliability coefficients of this table could be accepted at 
their face value, the obvious conclusion would be that the most reliable 
results are obtained from those items marked ‘‘4”’ and that the 
unweighted score is far superior to the weighted one. As has been 
pointed out, however, the selection of a different set of items by each 





1 Dr. G. T. Buswell very kindly made this test a required part of an under- 
graduate course in Educational Psychology at the University of Chicago. 
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individual renders the comparison of their scores entirely invalid. The 
reliability coefficients, then, must be interpreted with care. 

It is not recommended that the sets of items represented in the 
first three lines be used to obtain achievement scores. It was noted 
in grading the papers that a few excellent students were timid about 
claiming maximum credit, while some of the poorer students asked for 
credit of four points on as many as fifteen to twenty of the items which 
they had answered incorrectly. It is extremely unlikely that a student 
who has attended lectures and read the assignments should be so 
misinformed that he can make so many errors with perfect confidence 
that he is correct. It is much more reasonable to suppose that the 
student having a preponderance of ‘‘4’s’’ among his errors is attempt- 
ing to increase his score, even though the odds against him are two to 
one. In other words, he is gambling on those items which he does not 
know. If this tendency to gamble is a personality trait which is not 
intimately associated with the ability being measured, then it should 
not be permitted to affect the achievement score. 

The reliability of the weighted scores remains almost constant no 
matter how the items are selected. The mean weighted score likewise 
exhibits little variation. These facts suggest that the score is really 
determined by those items with the heaviest weight, the ‘‘4’s.”’ In 
fact, fifty per cent of the total weighted scores differ from the cor- 
responding scores on the items marked ‘‘4”’ by not more than ten 
points, or one-fifth the standard deviation. The scores on the ‘‘4’s”’ 
are obviously dependent upon the number of ‘‘4’s” marked. For 
this reason the weighted score should not be employed as a measure of 
achievement. 

If the tendency to gamble is a distinct trait, it should be possible 
to obtain a measure of it which is independent of the achievement 
score. An individual may be said to gamble on his score to the extent 
that he asks for credit for those items which he does not know. It is 
impossible to determine which of the correct responses were guesses, 
but for the present it will be assumed that all the incorrect responses 
were guesses, although it is possible that some may have been due to 
misinformation or to ambiguity in the true and false statements. 

The number of ‘‘4’s”’ marked by the one hundred sixty students 
ranged from zero to seventy-five, with but five scores at the extremes. 
The range in the number of ‘‘3’s”’ was zero to thirty-six, with nineteen 
zeros; the number of ‘‘2’s: Zero to fifty-three, with seventeen zeros; and 
the number of “‘no credit’”’ items: Zero to seventy-five, with sixty-nine 
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extreme scores. The correlation between the number of ‘4’s”’ on 
the odd and even items is .911 and that for the ‘‘3’s” is .788. These 
become .953 and .875 upon application of the Spearman-Brown 
formula. Like the corresponding values given by Wiley and Trimble 
(Table I), they are higher than achievement score reliability coeffi- 
cients. The ‘4’s,” then, provide a more discriminating and a more 
reliable measure than any other single credit value. 


Accordingly, a gambling score has been obtained by the following 
formula: 


Gambling = G = Errors marked ‘‘4” 
: Total errors + }% omissions 





100 


which is the percentage of the errors for which four points were 
claimed. The reliability of the G score is equal to .796, a value which 
compares favorably with the reliability coefficients of the test scores 
in Table II. The correlation between G and the total unweighted 
score, .084 + .05, being insignificant, the gambling score may be 
considered independent of the achievement score. The correlation 
between G and the total number of ‘‘4’s”’ is equal to .870, which is 
increased to .890 when the total achievement score (R-W) is partialled 
out. (The correlation between total ‘'4’s”’ and R-W is .307.) 

In setting up the formula for G, it was assumed that all errors 
represented guesses. While the validity of such an assumption cannot 
be tested for any individual, it is possible to examine the items of the 


TaBLe IJI.—Distrisvution or Test Irems AccorDING TO NUMBER OF ERRORS 
AND PERCENTAGE OF ERRORS MARKED ‘4”’ 





Number of errors 











Percentage 
of errors 0 | 10 | 20 | 30} 40 | 50 | 60 | 70 | 80 | 90 | 100} 110); 120 
marked ‘‘4” | to | to | to | to | to | to | to | to | to | to | to | to | to 
9 | 19; 29 | 39 | 49 | 59 | 69 | 79 | 89 | 99 | 109 | 119) 129 
ae 1 
ar 1 née Ew ics 1 
60-69......... 1 1 ” 2 1 1 1 1 
50-59. “a 2 2 1 1 1 1 
40-49......... 2) 4 1 3 1 1 1 1 1 
30-39......... 1 41 5] 2 1 3 3 1 1 1 
ee 4; 3] 2]... 2 
A ie it ) se 1 1 
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test for evidence of misinterpretation by the group asa whole. Table 
III summarizes the analysis of the seventy-five test items, which have 
been distributed in a two-way table according to the number of errors 
made by the one hundred sixty students and the percentage of those 
errors which were marked ‘‘4.”’ 

If a given item is guessed by every one, the chances are even that 
half will answer it correctly. It was decided, therefore, to eliminate 
all items answered incorrectly by more than one-half (eighty) of the 
students. I+ was decided also to eliminate all items for which the 
percentage of errors marked ‘‘4”’ exceeded fifty per cent except those 
answered incorrectly by fewer than fifty students. Thus fourteen 
items were arbitrarily eliminated, and new scores computed on the 
basis of the remaining sixty-one items. 

The reliability coefficients for the unweighted score, the weighted 
score (omitting items for which no credit was claimed), and the 
G score now become .580, .716, and .769, respectively. The gambling 
score is still independent of the R-W score (r = .089), while the number 
of ‘‘4’s”’ has a correlation of .339 with the R-W score. The correlation 
between G and the total number of ‘‘4’s”’ is .748, or . 766 with the R-W 
score held constant. 

In view of the foregoing results it may be concluded that the 
gambling score formula yields a reliable measure of a trait which is 
independent of achievement on the same test. The inclusion of a few 
items which may have been somewhat ambiguous or misinterpreted 
by a majority of the students did not seriously affect the correlation 
coefficients. Any objective test may be utilized to obtain a G score. 
It is important that the test be a relatively difficult one, in order that 
even the best students will make a sufficient number of errors to provide 
a reliable gambling measure. 
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CERTAIN ASPECTS OF THE VALIDITY OF THE F 
SCORES OF THE BERNREUTER PERSONALITY 
INVENTORY 


WALTER F. ST. CLAIR AND J. CONRAD SEEGERS 


Temple University 


A previous article! based upon an analysis of the Bernreuter Per- 
sonality Inventory scores of one thousand one hundred sixty-two 
freshmen presented evidence indicating a degree of validity in the 
Inventory and the probable desirability of considering the several 
scale scores in interrelationship, rather than singly. Especially was it 
suggested that the B2-S score, which seemed to disclose tendencies 
toward either withdrawal or self-sufficiency, should be interpreted 
in such manner. 

The technique which was used in the first article to throw light upon 
Bernreuter scores is used in this paper to examine Flanagan scores,’ 
and further consideration is given the interrelationship or profile 
method of interpreting Bernreuter scores. 

Flanagan’s’ contention is that the Bernreuter Personality Inven- 
tory portrays only two discrete traits. Evidence presented in the 
article cited previously, by the present writers, raises some doubts 
as to the legitimacy of that contention. This evidence suggested that 
different traits are measured by B1-N (neurotic tendency), B2-S (self- 
sufficiency), and B4-D (dominance), but that B3-J (introversion) and 
B1-N are nearly identical. Flanagan proposes, instead of the four 
B scores, F1-C and F2-S score,* which, he claims, describe the two 
discrete traits indicated by the Bernreuter Personality Inventory. 
In the tables which follow immediately are tabulated the F1 and F2 
scores of students selected because circumstances of personal history 
or environment might lead one to expect from them certain responses, 
or to find in them certain types of personality.’ The F2 scale is 





1St. Clair, W. F. and Seegers, J. C.: “‘Certain Aspects of the Validity of the 
Bernreuter Personality Inventory.” Journal of Educational Psychology. Vol. 
XXVIII, October, 1937, pp. 530-540. 

2 Bernreuter, Robert G.: Manual for Personality Inventory. Stanford Univer- 
sity Press, 1935. 

’ Flanagan, John C.: Factor Analysis in the Study of Personality. Stanford 
University Press, 1935, pp. 123. 


‘Hereafter, for convenience, these have been designated generally as Bl, B2, 
B4, F1 and F2. 


5 The manual for the Bernreuter Personality Inventory describes the Fl and 
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especially significant in this treatment, because of the high degree of 
community between F2 and B2 scores, which the evidence cited above 
indicates should be subjected to the profile type of interpretation. 

The findings in this paper were derived from the administration 
of the Bernreuter Personality Inventory to one thousand one hundred 
sixty-two Temple University freshmen during the 1934-1935 sessions 
and from personal information questionnaires completed by the same 
individuals. 


TABLE I.—PERCENTILE NorMsS FOR BERNREUTER PERSONALITY INVENTORY AND 
NormMs FOR TEMPLE FRESHMEN 











F1-C F2-S 
Temple Temple Temple Temple 
Raw | vane. fresh- vane fresh- | Raw iene fresh- eal fresh- 
scores| 5°” men _ men |scores| ®*” men — men 
— men | OM? | women ap men | ~O™meCD | women 
N-273 N-808 N-144 N-477 N-273 N-808 N-144 N-477 

175 - 99 98 99 100; 97 98 98 99 

150} 98 98 96 97 75) 95 94 95 95 

125} 97 96 91 93 50; 90 89 92 91 

100} 95 92 86 88 25} 82 80 84 84 

75) 91 88 80 81 0} 69 70 72 73 

50} 87 81 70 72 — 25) 652 53 58 58 

25} 82 75 58 63 — 50} 33 34 37 40 

0| 74 66 48 53 — 75} 20 20 21 25 

— 25) 64 56 38 42 —100;} 10 10 11 14 

— 50; 54 45 24 30 —125 4 4 4 7 

— 75) 44 33 14 22 —150 2 1 2 3 

—100} 30 23 9 11 —175 a & - 2 
—125; 20 14 4 5 
—150} 12 7 2 2 
—175 5 3 1 1 
































The Bernreuter Personality Inventories were rescored according 
to the Flanagan weights. 





F2 scores as follows: ‘‘F1-C is a measure of confidence in oneself. Persons scoring 
high . . . tend to be (too) self-conscious and to have feelings of inferiority .. . 
Those scoring low tend to be wholesomely self-confident and to be very well 
adjusted. ...” ‘“F2-S. A measure of sociability. Persons scoring high on 
this scale tend to be non-social, solitary or independent. Those scoring low tend 
to be sociable or gregarious.” 
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Table I shows in parallel columns the original Flanagan norms and 
the Temple percentile norms derived from the population of this study 
and one hundred twenty-three inventories secured at a later date. 
The latter are employed subsequently because of the larger number of 
cases. 

The responses of a number of individual students to questionnaire 
items indicated a definite possibility of deviation from the normal in 
respect to certain personality traits. The Fl and F2 scores of these 
individuals or groups of individuals are presented and analyzed in 
the following tables and paragraphs. The scores are expressed in 
percentiles. Accordingly, in the case of a normal group, one would 
expect a median and a mean of fifty per cent and one would assume that 
deviations from fifty per cent indicated some degree of abnormality. 
On the same assumption one would expect deviations in the scores of a 
group, when conditions are apparently conducive to abnormality, if 
the test measures personality traits. In each instance the arithmetic 
means, the medians, and the standard deviations of the F1-C and F2-S 
scores are displayed for the group under discussion. In inspecting 
these figures, it should be kept in mind that a normal group should 
have a mean of fifty per cent. In a normal distribution the standard 
deviation would be +28.7, and the number of cases within each 


interval would be equal when percentile scores, not raw scores, are 
used. 


TABLE II.—Seriovus DISAGREEMENT IN THE HOME 























Fl F2 
Disagreement No. of 
between cases | Me- | wicean| sp | M& | Mean| SD 
dian dian 
Father and eelf............. 15 48 48 |+29.8) 76 69 |+24.2 
Mother and self............. 9 65 57 |+27.8) 75 68 |+28.4 
Mother and father.......... 29 46 50 |+28.9) 59 54 |+26.6 
Brother and self............. 15 65 62 |+29.9) 45 50 |+31.2 
0 ren 8 35 48 |+33.8) 40 39 |+17.3 
Brother and sister........... 6 60 55 |+30.5) 65 57 |+19.5 
EC ET rere ee 14 54 49 |+18.7) 60 51 |+28.7 
Total group.............. 96 52 52 |+26.3) 60 56 |+28.4 

















Table II displays the means, medians, and standard deviations of 
Flanagan scores of students who reported serious disagreement in the 
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home. These means and medians, and those in subsequent tables, are 
expressed in terms of the nearest whole number. According to 
Flanagan, the Fl score indicates the degree of self-consciousness, 
F2 non-sociability. Contrary to the results noted! with this group 
when the Bl (neurotic tendency) score was employed, considerable 
deviation in F'1 is found in only three groups. Normally, one would 
expect Bl and F1 findings to agree, and the nature of the group would 
lead one to expect a degree of abnormality. In the F2 scores there is 
considerable and consistent deviation from the norm, as one would 
expect with such a group if the F2 score measures non-soeiability. 


TaBLeE III.—Srtruprents DERIRING INTERVIEWS 
































Fl F2 
Problems rent M M 
"= |Mean| SD ; © |Mean| SD 
dian dian 
ied ties 5 eae oe 44 55 55 |+27.8 54 56 |+25.7 
Family relations........... 5 | 75 67 |+18.3) 75 75 |+18.97 
is a ashe ek ska 23 62 63 |+30.2) 72 65 |+25.5 
REESE Se 61 60 57 |+30.3) 56 56 | +26. 
Re nla ie we eee 18 50 50 |+31.6) 60 62 |+28.9 
ick kage akee a 281 51 51 |+28.1| 5l 51 |+28.7 
END, 6 cia van secves 432 | 54 53 | +28. 8 54 54 | +28.2 








In Table III, which shows the mean and median scores of students 
who requested interviews to discuss certain specific problems, there 
is a consistent indication of both self-consciousness and non-sociability, 
according to the F1 and F2 scores, except in the group having financial 
problems. This is in accordance with normal expectations. In the 
actual interviews most of the students who had stated a desire to 
discuss sex problems desired simply to be referred to authentic sources 
of information. They had no personal sex problems. 

These data indicate self-consciousness and non-sociability in the 
cases of students who had infantile paralysis or heart trouble histories. 
Other diseases, according to these scores, seem to have had little effect 
upon personality. This is in agreement with what one might expect 
because of the nature of the illness. 





1Cf. St. Clair, W. F. and Seegers, J. C.: Ibid. 
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TaBLeE IV.—SeERiovus ILLNESSES 
Fl F2 
‘ No. of 
Type of illness 
cases | Me- | vean| sp | M& | mean} sD 
dian dian 

Infantile paralysis........... 9); 7 63 |+23.3) 58 62 |+23.1 
Heart trouble............... 30 67 59 /+31. 65 60 |+26.6 
I voc cnctsvdcecves 130 48 48 |+27.1| 45 49 |+28.7 
Other illmesses.............. 192 51 50 |+27.7| 51 51 |+28.2 
I «vida Saw ek been 361 51 50 |+27.9| 50 51 |+28.2 

















TaBLE V.—SrupEnts Havine Curonic Heapacues, Digestive TROUBLE, AND 


Spgeecu DeEFEctTs 























F1 F2 
No. of 
— Me- Mean; SD Me- Mean| SD 
dian dian 
I a i wl a 207 69 63 |+26.5) 57 54 |+28.9 
Digestive trouble............ 99 | 74 68 |+26.6) 59 55 | +28. 
Speech defects.............. 47 | 61 57 |+31.1) 58 53 |+27.8 














The significance of headaches, digestive troubles, and speech defects 
in relation to neurotic tendency is well recognized. Consequently, 
the scores in Table V are quite in consonance with normal and reason- 


able expectation. 


TaBLeE VI.—Strupents Houipine Ciass Orrices In Hicu ScHoou 














Fil F2 
Office pare M M 
= |Mean| SD "= |Mean| SD 
dian dian 

a 68 53 49 +29.8 52 48 /|+30. 
Vice-president............ 51 47 46 +29.2 51 49 |+26.4 
|, 46 | 52 53 +27.5 | 44 44 |+29.8 
is eke cud 38 | 45 47 +32.5 44 46 |+25.4 
EERE ae ee ee 71 39 43 +26.4 54 52 |+27.5 
Total group............ 274 | 48 48 | +29.1] 50 48 |+28.1 
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Taste VII.—Srupents Houpine Orrices 1n StupENT GOVERNMENT IN HiGu 














ScHOOL 
Fl F2 
Office a Me- Me- 
pol Mean; SD iia Mean| SD 
is ai ved die eden ee 15 40 41 |+32. 68 62 |+27.6 
Vice-president.............. 17 | 45 48 |+29.1) 45 45 |+26.3 
te ieee id iee wails 12; 40 43 |+23.1; 40 48 |+34.2 
Es cc ech ahead kauwh 8 70 54 |+36.6) 55 60 |+27.4 
Ce ce cc ak ss ha bene 137 44 44 |+27.9| 55 53 | +29.1 
Ee 189 | 43 44 |+30.2) 55 53 | +29.2 


























As one would expect, the scores indicate that these student leaders 
exhibit very little self-consciousness. However, it is surprising to 
note in certain scores an indication of non-sociability, and especially 
is it odd that the group of presidents should exhibit such a tendency. 
Note also that the measures of dispersion in these tables are generally 
slightly higher than one would expect. The validity of these F2 
scores, as consistent measures on non-sociability, seems at least 
questionable. 


TaBLeE VIII.—Strupents WHo Have EARNED MONEY PREVIOUSLY 














F1 F2 
How much rane M M 
“& |Mean| SD ; wi Mean| SD 
dian dian 

Bought clothes.............. 279 | 49 51 |+28.4| 60 56 | +29.1 
Spending money............ 396 | 50 51 |+27.9) 52 52 |+29.5 
Board at home.............. 53 44 45 |+28.3) 69 64 |+25.8 
Board out of home.......... 40 44 45 |+28.1| 72 63 |+27.8 
Occasional contributions.... . 143 44 46 |+27.8) 55 55 |+27.3 

Total group............. 911 | 48 50 |+28.3) 57 55 |+29. 


























The data in Table VIII corroborate Tables VI and VII. These 
students who have earned money display, according to the scores, 
little self-consciousness. But, if the F'2 scores may be believed, they 
consistently—in most groups in a high degree—are non-social and, 
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strangely, the more significant the contribution the greater the degree 
of non-sociability. Just as one must wonder, in reviewing Tables 
VI and VII, if presidents were actually non-social, or if the F2 score 
measures some other trait, so these figures inevitably suggest a similar 
query. Basing opinion upon a priori reasoning, it seems more likely, 
more in accordance with ordinary observation, to suppose that the F2 


scores are measuring, not non-sociability, but independence or self- 


sufficiency. In justice it should be said that, according to Flanagan, 
‘persons scoring high on this (the F2) scale tend to be non-social, 
solitary, or independent.’”’ This is a rather wide range of categories. 
The data in Tables VI, VII and VIII certainly serve to substantiate 
the element of independence, but there is not the slightest indication 
in these tables of non-sociability, aloofness, or withdrawal. The 
implications of Flanagan’s statement are that the individual would 
have a rather disagreeable personality. The implications of these 
tables are that independence is associated with high F2 scores, but no 
sinister interpretation seems at all justified. 

The evidence presented to this point indicates that the F1 score 
possesses a degree of validity as a measure of self-consciousness. Com- 
parison of these data with those presented in the study cited above 
suggests that the F1l-C scores and the Bernreuter B1-N scores are 
nearly identical and apparently measure identical or nearly identical 
traits.! 

The evidence presented agrees with Flanagan’s description, in that 
scores indicating non-sociability were registered by groups in which 
one would normally expect that trait. However, in certain cases the 
F2 scores seem to measure, not sociability, but desirable independence. 
The precise implications of the F2 scores seem to require further investi- 
gation, especially in their relation to other factors. To throw at least 
some light upon this problem, an analysis was made of the F2 scores 
of one hundred thirty-four students who, in their freshman year, 
joined sororities or fraternities, under the assumption that such 
students are a selected group likely to be social in nature. The F2 
scores of this group should be low, indicating sociability. Diagram 1 
shows the distribution of these scores. 

In Diagram 1 several distinct features force themselves upon one’s 
attention. First, the mean is not low, and the average sociability is 





1The reported coefficient of correlation (Manual) between the Bl and F1 
scales is .95 for one hundred fifty-seven cases. A correlation figure secured for 
five hundred cases of the group being studied is .90 +.0058. 
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not what one would anticipate. Second, the distribution is not 
symmetrical. In percentile scores, with an exactly normal population, 
the step intervals would include equal numbers. With a small group, 
with a selective factor deliberately introduced, this symmetry would 
not be expected, but a multi-modal distribution such as is actually 
found should by no means occur. Especially is it surprising to find a 
heavy weighting of cases at the upper extremity of the range. Ina 
normal group one would expect to find twenty per cent of the cases 
between 80-100. In this selected group we find 16.4 per cent in this 


NO. OF CASES 
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D1aGRAM 1.—F2 scores of one salient Miitinliaes students who joined a fraternity 
or sorority during their freshman year. 

position. One wonders if so large a proportion of such students are so 

highly non-social. 

The Bernreuter B1-N (neurotic tendency), B2-S (self-sufficiency) 
and B4-D (dominance-submission) scores of the twenty-two students 
in this group studied in their inter-relationships suggest an interpreta- 
tion which is possibly significant. From the analysis of these scores 
two distinct patterns or profiles emerge. The scores of five students 
falling into one of these profiles or patterns are listed in Table IX. 

According to Bernreuter’s interpretation these scores indicate 
neurotic tendency and submissiveness. It should be remembered that 





1Cf. Farnsworth, Paul R.: ‘‘A Study of Bernreuter Profiles.” Psychological 
Bulletin, Vol. XXX, 1933, pp. 600-601. 
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TABLE 1X.—ILLUSTRATIONS OF PROFILE I 














Case Bl B2 B4 

1 91 46 6 

2 99 60 8 

3 82 49 30 

4 96 73 10 

5 82 64 27 

a i a a ei ele ed 90 58 16 








Flanagan’s interpretation of the F2 scores of the individuals indicates 
non-sociability of high degree. The B2 scores vary, but note that, in 
each instance, the B2 score is much higher than the B4 score, the 
least difference being nineteen points. The profile presented is: 
Bi high, B4 low, B2 more than fifteen points higher than B4. 
This combination of scores, it seems reasonable to postulate, might 
point out a degree of non-sociability characterized by both inde- 
pendence and aloofness. 

Fourteen other students in this group whose F2 scores were very 
high fall into another pattern. To economize space, the means and 
standard deviations, rather than the individual scores of these students 
are displayed in Table X. 


TABLE X.—ILLUSTRATION OF PROFILE II 





Bl B2 B4 





Mean 51 Mean 83 Mean 74 














SD +15.4 SD +11.7 SD +13.4 











In these cases B1 is close to fifty, or normal, both B2 and B4 are 
high. Bernreuter would interpret these scores as indicating normal 
neurotic tendency, high self-sufficiency, independence, dominance. 
This by no means presents the impression of non-sociability or of 
unpleasant personality. It suggests, rather, desirable independence, 
possibly leadership.! If this is an accurate presentation, the F2 
scores have not truthfully depicted these students. The Bernreuter, 





1The scores of two of the other three students in the high F2 score group 


approach Profile I; that of the other resembles Profile II. Analysis is omitted to 
conserve space. 
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rather than the Flanagan interpretation, is in accordance with what 
would normally be expected of so large a proportion of the students 
who were selected by fraternities and sororities, and who chose to 
join them. 

To throw more light upon the theory that high F2 scores might 
not necessarily be indicative of non-sociability, but might rather be 
associated with desirable independence, the Bernreuter Inventory 
was administered to twenty-five student leaders. The F2 scores of 
thirteen of these students were so high as to indicate, according to 
Flanagan’s interpretation, non-sociability. These scores and the 
B1, B2, B4 scores of these students appear in Table XI. 


TaBLE XI.—Scores oF THIRTEEN STUDENT LEADERS ON BERNREUTER 
PERSONALITY INVENTORY 














Case Bl B2 B4 F2 

1 56 99 82 YY 

2 2 99 98 95 

3 1 99 QQ 95 

4 g 96 99 94 

5 1 91 ag 86 

6 8 89 92 82 

7 18 69 77 82 

8 34 68 97 80 

9 3 91 96 78 

10 9 95 76 75 

11 39 70 67 75 

12 4 74 99 72 

13 10 68 97 64 

a a ae al 15 85 91 84 

Ea Pe ee re 8 - 91 96 83 
Mer oe ee ied at a he +16.6 +12.8 +10.1 +10.2 














Among the students whose scores appear in Table XI seven had 
been elected to important presidencies, two are editors, two are popular 
athletes, four are employed by the university in positions which require 
diplomacy and skill in handling people, five were in the list of the ten 
seniors declared by vote to be most popular, five have received impor- 
tant appointments, and one, whose F2 score is ninety-four, is president 
of the student-governing body. It seems obvious that students of 
this type should not be non-social. It is a fact that one man whose 
F2 score is ninety-five has been pointed out repeatedly by students 
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and faculty as a superlative example of sociability. He seeks and 
apparently enjoys company. He goes out of his way to render favors. 
His scholastic record and record as a student president are excellent. 
Yet, his F2 score is ninety-five. 

The consistent trend toward low B1, high B2, B4, and F2 scores ina 
group so selected indicates that the high F2 score is not necessarily 
an indication of non-sociability, and that when associated with such a 
Bernreuter profile is likely indicative of independence and leadership.! 


SUMMARY 


Examination of Fl and F2 scores (measures of confidence and 
sociability) of certain students, whose responses to a questionnaire 
might lead one to expect certain abnormal personality traits, indicates 
that the F1 score possesses a degree of validity as a measure of self- 
confidence. The Fl and B1 scores seem to measure nearly identical 
traits. However, certain inconsistencies were apparent when the F2 
scores were analyzed. 

A multimodal distribution of the F2 scores of students who were 
selected for membership in fraternities and sororities and scores of a 
number of student leaders confirmed the impression that a high F2 
score is not consistently an indication of non-sociability. Examination 
and analysis of the Bernreuter scores of the same individuals indicate 
fairly definite profiles which seem to depict personality types. These 
profiles are determined by studying the Bernreuter B1, B2, and B4 
scores (measures of neurotic tendency, self-sufficiency and dominance) 
in their interrelationships and the B2 score seems especially important. 

Two profiles have been delineated tentatively. Profile I, it 
appears, might be associated with a withdrawal tendency. Profile II 
probably is indicative of leadership. In general, the evidence pre- 
sented strongly supports the view that the profile approach to the 


interpretation of the Bernreuter scores presents a fruitful field for 
research. 





1 It is of interest to note that a very capable man, not popular among students’ 
leader in a radical element, registered these scores: B 1-85; B 2-91; B 4-43; F 2-99. 
Note that his F2 score is high, as are those in Table XI, but note how the Bern- 
reuter profile differs from those in Table XI and is in the category of Profile I. 








THE INTERPRETATION OF I1Q’S ON THE LI-M 
STANFORD-BINET 


ROBERT G. BERNREUTER AND EDWARD J. CARR 


The Pennsylvania State College 


The IQ of an individual is most readily interpreted if it is thought 
of as a numerical expression of the rate of intellectual development 
of the individual. This is in contrast with the MA, which is a numeri- 
cal expression of the amount of intellectual development that has taken 
place. 

In interpreting 1Q’s obtained on the L-M forms of the Stanford- 
Binet, the descriptive table prepared by Terman (1, p. 79) for the 
original Stanford-Binet can not be used. This is because the spread 
of the IQ’s is greater for the L-M than for the original test. Terman 
and Merrill (3, p. 40) report a standard deviation of sixteen as typical 
for a distribution of L-M IQ’s. In Terman (2, p. 42), a distribution of 
the 1Q’s obtained on the original test is given, from which the present 
writers computed a standard deviation of twelve points. 

In Table I, a comparison of the IQ’s of the two Stanford-Binet 
tests is presented. This table was constructed by the standard score 
technique. Standard deviations of twelve and sixteen were used for 
the original and L-M tests, respectively. It was assumed that the 
IQ’s were normally distributed. This table also contains percentile 
and standard deviation scores. The percentile scores above 99 are 
reported as “‘incidence”’ scores instead of the usual percentile scores. 

In Table II, there is presented a set of descriptive terms for aid in 
interpreting L-M IQ’s. It was constructed upon the assumption 
that the MA of an average adult is fifteen years, and that the rate of 
development is constant. From the percentage of the general popula- 
tion which each of the terms describes, it is apparent that a compromise 
has been made between the customarily accepted distribution of such 
percentages and convenient round numbers on the scale of IQ’s. 
The term “‘normal”’ is applied to a somewhat larger number of persons 
than was true on the original scale, but still includes somewhat fewer 
than would have been included had the limits been taken as plus and 
minus one standard deviation. 

In evaluating the revised limits for the various levels of feeble- 
mindedness, the probable ultimate mental levels are significant. An 
idiot is a person who will, when an adult, probably have an MA of 
three years or less. An imbecile is one who probably will not attain 
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TaBLE I.—EQuIvaALent IQ’s ror L-M anp ORIGINAL Forms OF THE STANFORD- 


BINET, WITH STANDARD DEVIATION VALUES AND PERCENTILE SCORES 





L-M Binet 























L-M Binet L-M Binet L-M Binet 
Old Old Old |i——|_ Old 
Per- IQ Per- | IQ Per- | IQ IQ 
IQ | 8D centile so) 5D centile IQ} 8D centile IQ | 8D 
180 |5.00)......... 160 
i... ere 159 | 139) 2.44 129 } 99 .06 48 99 | 59 | 2.56; 69 
178 |4.88 1 in 158 | 138) 2.38 128 } 98 12 45 98 | 58 | 2.62; 68 
177 |4.81}1,000,000) 158 | 137) 2.31 128 | 97 .19 43 98 | 57 | 2.69) 68 
kL 157 | 136) 2.25 ay 127 | 96 .25 40 97 | 56 | 2.75) 67 
| 156 | 135) 2.19 99 126 | 95 .31 38 96 |} 55 | 2.81) 66 
+; Ser 156 | 134) 2.12 98 126 | 94 .38 35 96 | 54 | 2.88) 66 
I a x i tt tei 155 | 133) 2.06 98 125 | 93 .44 33 95 | 53 | 2.94) 65 
|) 2 ae 154 | 132) 2.00 98 124 | 92 . 50 31 94 | 52 | 3.00| 64 
171 |4.44 lin 153 | 131] 1.94 97 123 } 91 . 56 29 93 | 51 | 3.06) 63 
170 |4.38; 100,000) 152 | 130, 1.88 97 122 } 90 .62 27 92 | 50 | 3.12) 62 
8 | ee 152 | 129) 1.81 96 122 } 89 .69 25 92 | 49 | 3.19} 62 
168 16.36)... ccce 151 } 128) 1.75 96 121 | 88 .75 23 91 | 48 | 3.25) 61 
go °) ae 150 } 127} 1.69 95 120 | 87 .81 21 90 | 47 | 3.31} 60 
fo ee 150 | 126) 1.62 95 120 | 86 . 88 19 90 | 46 | 3.38) 60 
8 eee 149 § 125) 1.56 94 119 | 85 .94 17 89 | 45 | 3.44) 59 
OS ee 148 | 124) 1.50 93 118 | 84 | 1.00 16 88 | 44 | 3.50) 58 
8 2k ae 147 | 123) 1.44 93 117 } 83 | 1.06 14 87 | 43 | 3.56) 57 
2 eee 146 | 122) 1.38 92 116 | 82 | 1.12 13 86 | 42 | 3.62) 56 
i 8 eee 146 | 121) 1.31 90 116 | 81 | 1.19 12 86 | 41 | 3.69| 56 
160 |3.75 lin 145 | 120; 1.25 89 115 §| 80 | 1.25 11 85 | 40 | 3.75) 55 
10 ,000 
2} & 144 | 119) 1.19 88 114 | 79 | 1.31 10 84 439 | 3.81) 54 
Us sv cceccws 144 | 118) 1.12 87 114 | 78 | 1.38 8 84 | 38 | 3.88) 54 
fk Ee 143 | 117) 1.06 86 113 | 77 | 1.44 7 83 | 37 | 3.94) 53 
ep ES Kescoves 142 | 116; 1.00 84 112 | 76 | 1.50 7 82 | 36 | 4.00) 52 
ee 141 | 115) .94 83 111 | 75 | 1.56 6 81 | 35 | 4.06) 51 
|) Uk a 140 } 114; .88 81 110 | 74 | 1.62 5 80 | 34 | 4.12) 50 
_ . & eee 140 | 113) .81 79 110 } 73 | 1.69 5 80 | 33 | 4.19) 50 
| 139 | 112) .75 77 109 | 72 | 1.75 4 79 | 32 | 4.25) 49 
3 i 138 | 111 .69 75 108 | 71 | 1.81 4 78 | 31 | 4.31) 48 
SUMS ossceeen 138 | 110) .62 73 108 |} 70 | 1.88 3 78 | 30 | 4.38) 48 
fF 137 | 109) .56 71 107 | 69 | 1.94 3 77 | 29 | 4.44) 47 
148 |3.00 1 in 136 | 108} .50 69 106 | 68 | 2.00 2 76 | 28 | 4.50) 46 
147 |2.94 1,000} 135 | 107) .44 67 105 | 67 | 2.06 2 75 | 27 | 4.56) 45 
| & er 134 | 106; .38 65 104 §} 66 | 2.12 2 74 } 26 | 4.62) 44 
| Fe 134 | 105) .31 62 104 | 65 | 2.19 1 74} 25 | 4.69) 44 
Uk 133 | 104) .25 60 103 | 64 | 2.25 1 73 | 24 | 4.75) 43 
a 132 | 103; .19 57 102 | 63 | 2.31 1 72 | 23 | 4.81) 42 
ee 132 | 102; .12 55 102 | 62 | 2.38 1 72 | 22 | 4.88) 42 
3 & 131 } 101 .06 52 101 | 61 | 2.44 1 71 4 21 | 4.94) 41 
8 130 | 100} .00 50 100 | 60 | 2.50 1 70 | 20 | 5.00; 40 
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an MA high enough to learn to read. A moron is one who can master 
a certain amount of school work, but whose mental level will be below 
that of the average fifth-grade child. Few pupils in public-school 
special classes for mentally retarded children are able to go beyond 
the fourth grade in academic achievement. 

In both tables the incidence of IQ’s below the one percentile position 
is omitted. It is very likely that any estimate based solely upon the 
normal distribution of scores would be greatly in error at the lower end 
of the distribution. This is due to the effect of such factors as birth 
injuries, glandular imbalances, and sensory defects which interfere 
with normal mental development. 


TasLe II].—Descriptive CLAssiFICATION OF THE L-M Sranrorp-Biner IQ’s 








Descriptive | Approximate | Range of probable 
Range of IQ term per cent adult MA 

in ncecgeerunesin near genius 1 22.50 and above 
NG vende dnbencedaceowna very superior 3. 19. 50-22 .49 
DE cviitectaes othe ceeu superior 14. 17 .25-19.49 
DE see nedesaveesasetaee normal 66. 12.75-17 .24 
eee dull 14. 10.50-12.74 
rr er ere er borderline 2. 9.00-10.49 
ti ckheewsteeeneeewn moron 1. 6.00— 8.99 
tack hcad 5 sb eee Mae imbecile P 3.00— 5.99 
PP scntercehkassewawn’ idiot 0.00— 2.99 














It must be recognized that this schema has not yet been clinically 
or experimentally verified. It is merely a logical and statistical 
development of the data furnished by Terman and Merrill (3) and is 
based upon the stated assumptions. 
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BOOK REVIEWS 


MATTHEW LUCKIESCH AND FRANK K. Moss. The Science of Seeing. 
New York: D. Van Nostrand Co., 1937, pp. X + 548. 


This book appears at an opportune time. During the last three 
years information from various sources has raised important questions 
concerning hygienic vision, especially with reference to lighting. The 
authors have been intimately associated with this work on the science 
of seeing in their productive program of research and writing at the 
General Electric Lighting Research Laboratory. In their treatise 
there is a constant emphasis upon the ‘“‘human seeing-machine”’ or 
the expenditure of energy by the total human being in performing a 
visual task. Many physiological and psychological variables must be 
considered in the complex task of seeing if factors of safety in vision 
are to be maintained. 

Certain chapters of the book are excellent (4, 7, 10, 11). The 
discussion of visual thresholds is one of the best short treatments of 
the subject in print. In considering conservation of vision and 
achievement there is a well-balanced emphasis upon near-vision work 
in relation to eye defects, exacting visual tasks, illumination, and the 
like. A sound treatment of the basic factors in distribution of bright- 
ness within the visual field is given under “‘ quality of lighting.”” The 
discussion of spectral quality of light is critical and adequate. 

The critical reader, however, will not accept all the conclusions 
and recommendations on light intensities. It is argued that, since 
the eye evolved to see in bright daylight, hundreds or even thousands 
of foot-candles of illumination are needed for easiest seeing. No 
mention is made of the fact that the eye also was evolved so that it 
adapts itself to clear seeing at widely different levels of brightness 
from relatively low to very high intensities. Recommendations which 
the authors consider conservative range from five to ten foot-candles 
for casual seeing, to one hundred or more foot-candles for fine dis- 
crimination. Ordinary reading is supposed to need twenty to fifty 
foot-candles. These recommendations are derived from a group of 
studies in which a series of human reactions (variation in muscular 
tension at finger tips, blinking, heart-rate, pupil diameter, etc.) were 
recorded while the subject read under one and one hundred foot- 
candles, or under one, ten, and one hundred foot-candles. More strain 
or other indexes of greater fatigue occurred at ten than at one hundred 
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foot-candles, and greater differences occurred between one and one 
hundred foot-candles. In an analysis published elsewhere the reviewer 
has pointed out that the authors have been led to erroneous conclusions 
by the manner of treating their data. Measurements of response 
should have been made at other brightnesses between one and one 
hundred foot-candles. Interpolation of the authors’ data reveals 
marked changes in the response of subjects from a low brightness up 
to about five foot-candles, less marked from five to ten, very slight from 
ten to about twenty and very little change occurred thereafter. It is 
clear that all the changes that are of practical importance occurred at 
relatively low illumination intensities. Obviously, there is no justifica- 
tion in their data, which form the foundation for their whole argument, 
for specifying intensities above about fifteen foot-candles unless there 
is a special situation such as illegible print, defective eyes, or dis- 
crimination of fine details. Their data on preferences for intensity 
of light for comfortable reading have likewise been misinterpreted. 

Relative readability of different sizes of type, different type faces, 
and the like were determined by the Luckiesch-Moss Visibility Meter. 
Scores are given in terms of the relative visibility of the types. The 
foot-candles necessary to make the types equally visible are then 
indicated. In these determinations it is assumed that visibility is 
highly correlated with readability. It has been shown by the reviewer 
in published research that this is not so. Type, due to variation in 
size or in type face, may be relatively highly visible but produce 
comparatively slow reading, or there may be no variation in speed of 
reading but a wide range in visibility. Only when relative visibility 
is due to brightness contrast between print and background is visibility 
and readability intimately associated. Many of the authors’ sugges- 
tions concerning the adequate brightness of light equated for particular 
visual tasks, therefore, are invalid. Such determinations need to 
be made by a different technique. 

The failure to consider many important contributions from other 
laboratories weakens the book considerably, especially in the dis- 
cussions of legibility of print and factors in hygienic illumination. 
The authors tend to dismiss findings contradictory to their own with 
ridicule rather than after analysis. 

Contradictions are present. In general, the authors are opposed 
to measuring visual efficiency in terms of how much work can be 
accomplished in a set time. They advocate measuring the energy 
expended during visual work, which they claim is more representative 
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of the efficiency of the human seeing-machine. Later, however, they 
cite work of their own in which measurement of visual efficiency is in 
terms of amount of work completed within a set time. 

Errors due to misinterpretation or unfamiliarity with recent 
literature occur. Examples: (1) citation of color-blind frequency as 
four or five instead of eight; interpretation of Gilliland’s data as 
indicating that eye movements are not criteria of readability of type. 
The reviewer has shown that there is practically a one-to-one relation- 
ship between eye movements and other measures of readability or 
legibility of type. 

The careful reader will find many praiseworthy things in this book. 
The authors’ long history of contributions to the science of seeing has 
given them high prestige in the field. Such prestige, however, should 
not prevent a critical examination of their techniques of measurement 
and the validity of their conclusions. Mites A. TINKER. 

University of Minnesota. 


Marion Monroe and Bertie Backus. Remedial Reading. A 
Monograph in Character Education. Boston: Houghton Mifflin 
Co., 1937, pp. XI + 171. 

May Lazar. Reading Interests, Activities, and Opportunities of Bright, 
Average, and Dull Children. New York: Bureau of Publications, 
Teachers College, Columbia University, 1937, pp. 127. 


Among all of the subjects taught in schools, or among all the 
skills required for success in our social culture, none is more important 
than reading. While the importance of reading has, of course, been 
long recognized, it is during the last two decades or so that a great deal 
of attention has been paid to certain aspects of the subject. Perhaps 
the two areas of greatest interest to investigators have been the 
disabilities in reading exhibited by many school children, and reading 
interests, and facilities for satisfying them, of both children and 
adults. The two short monographs being reviewed are excellent 
contributions to these two areas of investigation. 

Any contribution in the field of reading disability by Dr. Monroe 
is certain to bear the stamp of authority. The present instance is no 
exception. This monograph is a description of an extensive, although 
experimental, program in remedial reading carried on as part of a 
larger program of character education in the public schools of Washing- 
ton, D. C. A short introductory chapter establishes the logic of 
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remedial reading in a program of character education. The next two 
chapters summarize the general principles of diagnosis and remedial 
instruction in reading. ‘The second chapter, on diagnosis, will be of 
the utmost value to every classroom teacher, and is easily the best 
non-technical summary of the field available in print. In the four 
remaining chapters are described in detail the methods used in the 
experimental elementary, junior high, senior high, and vocational 
schools for remedial instruction in reading. While these descriptions 
are not intended to be exhaustive, they are sufficiently complete in 
detail to be of great value to school administrators and teachers. 
The small size of this book (five and one-fourth by seven and one-half 
inches) makes one feel that it should become the constant pocket 
companion of at least every elementary-school teacher. 

Dr. Lazar’s report deals with an investigation of the reading 
interests of two thousand children from thirteen public schools located 
in three boroughs of New York City. An analysis is made of the 
reading interests of boys and girls, and of the differences between 
three ability level groups for each sex. The results show that reading 
interests are related to the intelligence of the child, dull pupils choosing 
simpler and less realistic types of reading. ‘There were also evident 
sex differences in the sort and number of books read. In a large meas- 
ure the voluntary reading of children is influenced by their home 
conditions and the ease of access to library facilities. The discrepance 
between the kinds of books the children prefer and the kinds that are 
selected for them by experts in children’s literature suggests that some- 
thing is wrong. Either the experts better revise their basis of judg- 
ment or there needs to be better publicity for the types of books they 
suggest. The data reported in this monograph are valuable con- 
tributions to the growing material on children’s reading that is of 
importance not only to the schools, but also to librarians and parents. 

Indiana University. C. M. Lovrttir. 


J. Stantey, Gray. Psychological Foundations of Education. New 
York: American Book Company, 1927, pp. XII + 534. 


The viewpoint of the author of this recent textbook in educational 
psychology is perhaps best expressed in his statement that ‘‘Educa- 
tional psychology must: ever consider itself to be merely a branch of 
biology” (p. 36). The emphasis throughout the book is on a mecha- 
nistic and behavioristic approach to human nature and education. 
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The first section of the book, comprising approximately half of 
the volume, deals with the nature of man. In this section, most of the 
traditional topics, including instincts, emotions, heredity and environ- 
ment, learning and the nature of intelligence, are considered. The 
nature of man, according to Gray, is a product neither of heredity 
alone nor of environment alone, but of a combination of the two, for 
which union he invents the term ‘“‘herediviron.”’ 

Learning and maturation, in the view of the author, are not two 
distinct processes; rather, both consist in structural modification, and 
differ only in their causes. The theory is advanced that learning takes 
place when an organism is stimulated to behave in the desired way for a 
length of time sufficient to bring about the necessary structural 
changes. Thorndike’s law of effect is discarded on the familiar ground 
that learning takes place during behavior, and that an effect cannot 
operate on a completed behavior series. In considering the nature of 
intelligence, Gray rejects, after a brief criticism, the Gestalt, Thorn- 
dike, and Spearman viewpoints. As a definition of intelligence, he 
proposes ‘‘That type of behavior which is caused or directed by the 
accurate prediction of future conditions” (p. 259). 

The second section of the book is concerned with the nature of 
education. This latter half of the volume includes chapters on the 
nature of social institutions, the content and methods of education, 
tests and measurements, and educational counseling. Gray sets as 
primary objectives of education the development of the individual 
in the specialized field for which he is best suited, and the inculcation 
of a codperative attitude which will enable him to function effectively 
as a social unit. ‘‘Intelligent socialized behavior’? should be the 
outcome of education, and to best achieve this outcome, school 
procedure should be in the form of problem solving, not fact amassing. 

Two chapters are devoted to tests and measurements; emphasis 
is placed on personality and character testing and its importance in the 
field of educational counseling. The treatment of test validity and 
test reliability is rather brief, with little consideration given to the 
statistical aspects of these problems; however, the lack of emphasis 
on these topics is, perhaps, justifiable in a text such as this, intended 
only as a “‘foundations”’ course in educational psychology. 

The insistence on the biological approach has resulted inevitably 
in a rather limited view of the field. How many educational psy- 
chologists will agree, for instance, with the statement quoted at the 
beginning of this review as to the relation between psychology and 
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biology? The author’s rejection of all non-materialistic realities 
leads him to make the somewhat extravagant statement that ‘“‘The 
thousands of pages which have been written about the mind have not 
contributed one whit of knowledge about it” (p. 38). Unfortunate, 
too, are certain misapprehensions which have crept into the book 
concerning such matters as the attitude of religion toward scientific 
truth, and the doctrine of free will—topics, which, incidentally, 
might have been omitted from an educational psychology with no 
perceptible loss. What appears to be a somewhat disproportionate 
emphasis has been given in the first part of the book to a consideration 
of the philosophical aspects of psychology, and, in the latter half, to 
the discussion of the nature of social institutions. 

Although the author claims that his is an ‘‘objective”’ psychology 
of education based on scientific evidence, there is a noticeable lack of 
experimental data presented in support of the theories advanced in 
the second section of the book. Authorities are quoted liberally, 
but very few of the references are to experimental studies. 

A departure from usual book-making practise has been made in 
designating the sections of the book ‘‘Problems” instead of the 
conventional ‘‘Chapters.”” An annotated bibliography is supplied 
at the end of each ‘‘ problem.” Roger T. LENNON. 

Bronx, N. Y. 





